Dynamic and Precise Engineering Surveying-Springer (2023)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 519

Qingquan Li

Dynamic and
Precise Engineering
Surveying
Dynamic and Precise Engineering Surveying
Qingquan Li

Dynamic and Precise


Engineering Surveying
Qingquan Li
College of Civil and Transportation
Engineering
Shenzhen University
Shenzhen, China

ISBN 978-981-99-5941-9 ISBN 978-981-99-5942-6 (eBook)


https://doi.org/10.1007/978-981-99-5942-6

Jointly published with Science Press


The print edition is not for sale in China mainland. Customers from China mainland please order the print
book from: Science Press.
ISBN of the Co-Publisher’s edition: 978-7-03-074703-7

© Science Press 2023, corrected publication 2024

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publishers, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publishers nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publishers remain neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore

Paper in this product is recyclable.


Foreword by Prof. Deren Li

Engineering surveying is a vital branch of surveying and mapping, serving as a funda-


mental and generic discipline that ensures the rapid development of the national
economy. It is extensively applied in urban construction, industrial manufacturing,
traffic engineering, water hydraulic engineering, underground engineering, pipeline
engineering, marine engineering, and other fields to guarantee their design, construc-
tion, and operation. Dynamic and precise engineering surveying represents the fron-
tier of the surveying and mapping discipline. It integrates engineering technology,
computer technology, electronic information technology, and automatic control
technology while being deeply intertwined with civil engineering, water conser-
vancy engineering, aerospace engineering, and more. This field caters to various
applications, including large-scale buildings, transportation facilities, water conser-
vancy hubs, large-scale scientific devices, and lunar exploration projects. Currently,
dynamic and precise engineering surveying is not confined to areas accessible by
humans. It extends to inhospitable and inaccessible regions, such as underground,
underwater, and space environments. Consequently, it increasingly exhibits a typical
multidisciplinary nature.
Since the reform and opening up, China has made remarkable achievements in
engineering construction. The scale of infrastructure projects such as highways, high-
speed railways, urban subways, reservoirs and dams, large airports, and extensive
pipelines has expanded rapidly. By 2020, China had 161,000 km of highways and
38,000 km of high-speed railways, 38,000 road/railway tunnels, and millions of
municipal pipelines. These large-scale infrastructures serve as the engine for national
economic development. Their operational conditions impact the growth and stability
of the national economy, as well as the safety of people’s lives and property. As a
result, it is essential to continually conduct precise surveying of the structural and
apparent changes of these infrastructures during their operation. Traditional engi-
neering surveying employs levels, theodolites, or total stations to measure elevation
or horizontal angles. These measurements are neither automatic nor economical
and can be time consuming. Positioning and surveying are carried out from one

v
vi Foreword by Prof. Deren Li

point to the next, making it challenging to achieve full coverage of large-scale high-
ways, railways, and pipeline networks. Efficiently and precisely surveying signifi-
cant infrastructure to ensure its safe operation is a crucial national demand. Dynamic
and precise engineering surveying, conducted in efficient, intelligent, and contin-
uous ways, effectively meets this urgent demand. However, no generic surveying
equipment can satisfy the diverse requirements of various application scenarios,
such as pavement defect detection, high-speed railway track detection, dam internal
deformation monitoring, and pipeline internal damage detection. Surveying methods
and equipment are typically designed and developed on a case-by-case basis. The
constant demand for specialized equipment is a significant feature of dynamic and
precise engineering surveying. Many application scenarios require comprehensive
surveying, and the survey data typically constitutes big data, including millimeter-
resolution road images, tunnel 3D laser scanning data, and structured light 3D survey
data above 20,000 Hz. Efficiently processing and analyzing survey data is a chal-
lenge in dynamic and precise engineering surveying. Additionally, interpreting the
survey results to make accurate judgments and predictions regarding the safety
state of infrastructure, guiding safe operation and maintenance, is also a task to
be accomplished by dynamic and precise engineering surveying.
In 1993, I supervised Qingquan Li in his on-the-job doctoral study. Since 2000,
he began studying dynamic and precise engineering surveying and led the first
domestic team researching road inspection, unmanned aircraft 3D measurement,
and autonomous driving. In 2012, he transferred from Wuhan University to Shen-
zhen University. Since then, not only has he led the university to leapfrog devel-
opment as the president, but he has also conducted innovative research on coastal
zone mapping, rock-fill dam deformation monitoring, urban drainage pipe detection,
and more. He has been dedicated to dynamic and precise engineering surveying for
over two decades, during which his team has put forward new methods and equip-
ment. His research is characterized by the combination of advanced surveying and
mapping technologies with various engineering applications, focusing on theoretical
study, technological innovation, and equipment development. Meanwhile, he contin-
uously trains students to transform research outcomes into practical productivity and
promote industrial progress, truly embodying the concept of “writing a thesis on the
motherland.”
This book summarizes the study led by Prof. Qingquan Li over the past nearly two
decades. The book systematically describes the spatiotemporal data, multi-sensor
integration, and data processing of dynamic and precise engineering surveying.
Meanwhile, it introduces a variety of professional equipment developed by the team
for dynamic and precise engineering surveying and many typical engineering applica-
tions. Furthermore, it enriches the theoretical and technological contents of dynamic
and precise engineering surveying, which is an outstanding contribution to the engi-
neering surveying discipline. The publication of the book will vigorously promote
Foreword by Prof. Deren Li vii

the theory and technology of dynamic and precise engineering surveying and help
establish China’s leading role in dynamic and precise engineering surveying.

December 2020 Prof. Deren Li


Academician of the Chinese Academy
of Sciences and the Chinese Academy
of Engineering
Wuhan University
Wuhan, China

The original version of the book was revised: Belated corrections in the whole book have been
updated. The correction to the book is available at https://doi.org/10.1007/978-981-99-5942-6_8
Foreword by Prof. Zhenglu Zhang

It is an honor and a great pleasure to be invited by Prof. Qingquan Li to review


his newly written book, Dynamic and Precise Engineering Surveying. As a teacher
engaged in teaching and researching engineering surveying for more than fifty years,
I am well-versed in engineering surveying and have a deep understanding of precise
engineering surveying. I am also familiar with dynamic surveying, as it is an emerging
field within the domain of precise engineering surveying. Over the past two months,
I have completed the reading of the manuscript and provided some comments and
suggestions. Despite a few minor shortcomings, the overall excellence of the book
remains intact, as it represents a culmination of Prof. Qingquan Li’s team’s achieve-
ments. At the same time, I regret that dynamic and precise engineering surveying
and representative studies have not been summarized in my previously written book,
Engineering Surveying.
Prof. Qingquan Li studied in the Department of Engineering Surveying at Wuhan
Technical University of Surveying and Mapping from 1981 to 1988 as an under-
graduate and master’s student and then continued as a teacher at the university. In
the 1990s, he was engaged in photogrammetry and remote sensing research under
the supervision of Academician Deren Li, ultimately obtaining his Ph.D. degree in
engineering. In 1994, Prof. Qingquan Li participated in one of my industrial funding
projects and co-authored a paper entitled “Study on Automatic Inferring and Linking
of Mine Rock Boundary Lines,” which was published in the Journal of Wuhan Tech-
nical University of Surveying and Mapping. Since then, we have become academic
and research partners.
Dynamicity and high-precision are two main requirements of modern engineering
surveying. Dynamicity primarily refers to the dynamic deformation of the surveyed
object over time. That is, the measured object changes when the equipment or sensors
are set up at a fixed position. For example, continuous surveying is needed to capture
the dynamic process of a vibrating towering building. High precision has always been
the focus throughout the planning, design, construction, operation, and management
of precision engineering, especially in the installation of large and special equip-
ment and deformation monitoring of engineering construction. This book introduces
a series of mobile professional equipment (survey platforms) that comprises precise

ix
x Foreword by Prof. Zhenglu Zhang

surveying instruments and multiple sensors. The introduced equipment can contin-
uously obtain multi-source geometric data and non-geometric information of the
measured objects, such as precise surface morphology and massive digital images.
The surveyed objects are mainly national or urban infrastructures, such as high-
ways, railways, bridges, tunnels, dams, roads, and underground pipelines. At the
same time, the surveying platforms can be vehicles, ships, UAVs, or other movable
carriers. Related data processing has also been discussed thoroughly, including
dynamic spatiotemporal data, time synchronization, multi-source data fusion, contin-
uous observation data splicing, and embedding in surveying instruments, such as
sensors, software, and hardware. Through surveying data analysis, the state of the
surveyed objects can be assessed, providing stable and reliable support for their long-
term safe operation and maintenance. At present, the life spans of most reinforced
concrete infrastructures in our country are designed to be one to two hundred years
without natural and human forces. Safety risks are exacerbating with the aging of
the constructions. Modern engineering surveying technology has met increasingly
diverse demands in the planning, design, and construction stages. However, chal-
lenges remain during operation and maintenance. To this end, this book provides a
theoretical and practical basis to explore the solutions.
Applications introduced in this book are beyond the scope of traditional engi-
neering surveying, such as volume measurement of large dumping sites (coal
mine dumping as the earliest), coastal topographic surveying, indoor surveying and
mapping, and deflection measurement of the road surface. For example, intelligent
inspection of pavement belongs to engineering surveying, but the additional informa-
tion obtained by such road surface comprehensive inspection equipment contributes
to vehicle navigation far more than road surface inspection itself. These equipment
series, invented and developed by Prof. Qingquan Li’s team, have served more than
70% of roads and numerous city streets all around China.
In summary, this book presents a new scope of engineering surveying in which
an innovative theoretical and technological system is formed, weaving spatiotem-
poral data, big data processing, evaluation and analysis, measurement technology,
and advanced equipment development together. This theoretical and technological
system has been used in many practical applications and accounts for an outstanding
contribution to the engineering surveying discipline.
Prof. Qingquan Li has devoted himself to innovative research in the engineering
surveying discipline for two decades. He has profound insight and excellent fore-
sight into the disciplinary trend. He has led his team with great wisdom and played a
leading role in engineering surveying. He specializes in photogrammetry and remote
sensing. At the same time, he also integrates the knowledge of geodesy, GIS, and
other fields with his specialization. Therefore, he has a reputation as an expert in
integrating 3S or multi-S technologies. He has also become a well-deserved expert
in the field of transportation, as road surface comprehensive inspection equipment
has been widely applied in this field. He has made remarkable achievements in
dynamic and precise engineering surveying. Meanwhile, he is personally knowl-
edgeable, insightful, persevering, and determined. As the president of Shenzhen
Foreword by Prof. Zhenglu Zhang xi

University, he paid great attention to the university’s research construction and insti-
tutional reform. In just a few years, he has led Shenzhen University to be among
the top 100 universities and one of the most vigorous universities in China. He has
simultaneously achieved success as an academic and administrative leader, which is
admirable.
Finally, it should be noted that the surveyed objects in most applications introduced
in this book are immobile. The surveying aims to obtain their slight deformation and
precise morphology, detect changes such as breakage, cracks, corrosion, and shed-
ding, and derive indexes such as the smoothness of the road surface and the alignment
of the railway track. In some other applications, the subsidence and displacement
of the objects can be obtained through periodic surveying. In the future, dynamic
and precise engineering surveying can be further expanded to monitor changes of
moving objects, such as the dynamic detection of wind power blades.
On the occasion of the National Day and Mid-Autumn Festival in China, I would
like to express my heartfelt congratulations to Prof. Qingquan Li and his team on
the publication of this book. It is undoubtedly good news for teachers, students, and
scientists engaged in engineering surveying, as it unveils broader applications and
possibilities of this discipline. At the same time, this book can be an expansion of
knowledge and inspiration for innovation for a wide range of readers.

September 2020 Prof. Zhenglu Zhang


Shanghai, China
Foreword by Dr. Naser El-Sheimy

Engineering surveying is a principle with a long history, escorting urban develop-


ment all over the world. The early known use of surveying practices in human history
includes the pyramid (2700~1750 B.C.) in ancient Egypt and the urban drainage
system at a Neolithic site of Longshan Culture (2500~2000 B.C.) in ancient China.
With industrial revolutions and the recent information revolution throughout human
history, construction activity is increasingly armed with advanced technology and
intelligence. Human activities have stretched from land to ocean, from the earth’s
surface to underground, and even to space. Many megaprojects that never dreamed
of have been constructed, such as nuclear power stations, space stations, under-
water tunnels, across-sea bridges and large electron position colliders. It can also be
observed that engineering surveying is not only limited to infrastructure construc-
tion but also extends to various fields that can be seen in our daily life, such as
bridge monitoring, road surface detection, mining activity monitoring, and sports
arena surveying. High requirements from megaprojects and multiple demands from
diverse applications jointly drive traditional engineering surveying to transform from
static, discrete, and manual into dynamic, precise, and intelligent surveying, which
is the focus of this book.
This transformation is especially prominent in China. As a country possessing
ancient, delicate designed constructions such as the Dujiangyan Irrigation System in
Sichuan province, China currently retains its power in infrastructure construction. It
is reputed by many megaprojects, such as the Three Gorges Dam, the national high-
speed railway network, the South-to-North Water Diversion Project, the Qinghai-
Xizang Highway, hundreds of across-sea bridges, and thousands of super high-
rise buildings. These construction projects feature large spatial coverage, complex
construction processes, and complex construction environments, challenging the
existing technology and skills used in dynamic and precise engineering surveying.
Therefore, the urgent need for megaprojects drives the development of this research
field and activates its combination with data science, artificial intelligence, and inte-
grated sensing technology, forming a multidisciplinary field. Many talents have
contributed to this research field and proposed their creative solutions, which play

xiii
xiv Foreword by Dr. Naser El-Sheimy

an essential role in constructing these large and megaprojects. Among these people,
Prof. Li and his colleagues are an outstanding group in China.
Professor Li has been working on engineering surveying for more than two
decades and is a pioneer in the field of dynamic and precise engineering surveying.
He and his team have been actively involved in infrastructure construction in China,
especially the Guangdong-Hong Kong-Macao Greater Bay Area, one of four bay
areas that rank as the most economically and technologically developed areas glob-
ally. The Greater Bay Area was recently formed and is still under construction, with
many mega construction projects proposed or being constructed, such as the Hong
Kong-Zhuhai-Macao Bridge, the Shen-Zhong Link, and the Water Resources Allo-
cation Project of Pearl River Delta. These projects call for advanced technology to
be applied and more innovation to contribute to dynamic and precise engineering
surveying.
However, it is surprising that only a few books are available to systemati-
cally describe dynamic and precise engineering surveying, combining the advanced
methodology and technology developed in other disciplines, such as artificial intel-
ligence, data science, infrastructure, and transportation engineering. Knowledge in
this field has been scattered in journal papers, conference proceedings, patents, and
technical reports.
With calls from demand-oriented applications and advanced technology, this book
by Prof. Li is remarkable and timely. It combines fundamental knowledge with prac-
tical applications, which is very intriguing and inspiring to people in this field. It
is evident that this book is written by authors with both mature teaching and prac-
tical skills, as some of the creative applications are carried out by students under the
guidance of professional teachers and workers. For example, the dynamic deflec-
tion measurement, road damage detection, high-speed railway, and bridge health
monitoring were completed by Prof. Dejin Zhang, Prof. Qingzhou Mao, and Prof.
Qingquan Li; the dynamic surveying in autonomous driving was contributed by Dr.
Long Chen and Dr. Liang Zhang; the indoor and underground space surveying was
contributed by Dr. Baoding Zhou, Dr. Lian Huang, and Dr. Zhipeng Chen; UAV
3D surveying was contributed by Dr. Liang Zhang and Dr. Wenshuai Yu; and the
integration surveying of coastal zones was carried out by Dr. Chisheng Wang, Dr.
Minglei Guan, and Dr. Kai Guo. These works involve new methodologies, innovative
equipment, and various practical applications.
Generally, the book covers the state of the art in advanced surveying engineering.
It offers a helpful overview of the challenges and opportunities in this field for the
technical and business community interested in surveying engineering in megapro-
jects. Despite the challenge of chasing rapid developments in the field, the book
covers a wide variety of state-of-the-art implementations in real-life mega engi-
neering projects. The book mainly targets technical audiences, including advanced
Foreword by Dr. Naser El-Sheimy xv

undergraduate students, graduate students, engineers, scientists, mobile developers,


business developers, and entrepreneurs working in relevant disciplines.

Dr. Naser El-Sheimy, P.Eng


Canada Research Chair in Mobile Multi-sensor Systems
Fellow, Canadian Academy of Engineering (CAE)
Fellow, The US Institute of Navigation (ION)
Department of Geomatics Engineering
The University of Calgary
Calgary, Canada
Preface I

The global COVID-19 pandemic that erupted in 2020 has changed the world and
our lives, marking 2020 as an unusual year. Almost all my team members and I
are bound to Wuhan. Some of us lived and worked in Wuhan, and others were
in Wuhan during the epidemic. Our several teachers, classmates, and friends were
even infected with COVID-19. Every news of the pandemic had been touching our
nerves. We had felt anxious and helpless about the unclear situation. We could not
do much during the bad time but only pray and cheer for our friends, their families,
and Wuhan citizens. The sudden COVID-19 outbreak and the lockdown threw our
working pattern out of kilter, but thinking optimistically, it left us more spare time
to discuss issues we had no time to think about before and to do things we had
no time to do before. During the lockdown period, my team members were working
from home, and the only way to research, discuss, and work together was through the
Internet. One thing we had thought about for years but never had the time to do was to
summarize our research and the industrialized outcomes in the field of dynamic and
precise engineering surveying over the past two decades. When the book is almost
completed, the domestic pandemic is under control with several recurrences, but it
is going on in some foreign countries. I eagerly hope that the epidemic will end soon
and campus normality will resume as quickly as possible.
By a remarkable coincidence of history, 17 years ago, when SARS erupted, the
campus of Wuhan University was in lockdown. I led faculty and students of my
team to discuss the following research direction in our laboratory. In our opinion,
accelerating infrastructure construction in China would undoubtedly lead to large-
scale infrastructure maintenance, and the focus of engineering surveying would also
shift from measurement to detection and monitoring. Therefore, I decided that the
infrastructure safety state survey should be our research direction. Combined with the
research accumulation in mobile measurement and engineering surveying, we first
conducted research on road surface inspection and relevant equipment development.
After years of efforts, we made breakthroughs in several aspects, including multi-
sensor integration and synchronous control, multi-source data fusion and processing,
and development of new sensors and professional surveying equipment. Our work

xvii
xviii Preface I

lays the theoretical and technological foundation for dynamic and precise engineering
surveying.
Time flies! My team has been working in dynamic and precise engineering
surveying for nearly two decades. We developed our first comprehensive road surface
inspection equipment, road deflection dynamic surveying equipment, and road and
tunnel inspection equipment. Starting from developing road inspection equipment,
we have expanded our research to railways, airports, subways, bridges, pipelines,
and other fields. A series of surveying equipment has been developed and widely
used throughout the country. Meanwhile, several start-ups were incubated in Wuhan,
Beijing, Shenzhen, Xi’an, and other places, leading to technological innovation.
We have overcome many incredible difficulties regarding funding, technology, and
personnel in this process. I vaguely remember that the shell of our first road surface
inspection camera was modified from one pressure cooker and that road cracks were
manually identified in collected images. Many of the difficulties we have encoun-
tered will never be forgotten and have become our most precious life experiences
and memories.
In 2012, I transferred my position from Wuhan University to Shenzhen Univer-
sity as the president and recruited a research team comprising old and new staff at
Shenzhen University. As Shenzhen is a coastal city, I determined that our research
should be closely related to the water environment and coastal applications. There-
fore, we expanded our research to new applications, such as dynamic surveying of
pipeline networks and coastal engineering surveying focusing on blue-green laser
bathymetry, high-precision inertial combination measurement, underwater high-
precision positioning and measurement, etc. A series of professional equipment was
developed during the process, including a blue-green laser airborne surveying system,
water-shore integrated 3D surveying system, surveying robots for rock-fill dam
internal deformation monitoring, capsular robots for drainage pipeline inspection,
etc. Currently, my team is researching monitoring the state of large-scale bridges and
bridge clusters, intelligent surveying and control during the connection of underwater
tunnel tubes, and more.
Over the years of research and teaching, my understanding of this field has deep-
ened. Engineering surveying is an essential branch of the surveying and mapping
discipline with a wide range of applications. Unlike other branches of the surveying
and mapping discipline, engineering surveying is closely related to various engi-
neering constructions, spanning the entire process, including planning, construction,
and operation. Along with the development of large-scale engineering construction
in China, new requirements for engineering surveying are emerging, promoting the
rapid growth of engineering surveying technology and applications, and involving the
largest number of practitioners in the surveying and mapping industry. The constant
development of this field is stimulated by the long-term requirements proposed
by new applications. A typical example is the comprehensive safety state inspec-
tion and monitoring of large-scale infrastructure. It has a long-lasting demand, but
the task difficulty and the market scale vastly exceed the scope of infrastructure
construction measurement in traditional engineering surveying. Modern surveying
faces increasing demands, not only requiring high precision and efficiency but also
Preface I xix

involving larger working fields, more complex environments, more diverse measure-
ment elements, and more significant scale changes. For a long time in the past,
conventional methods and equipment characterized by static, discrete, and single
features could not meet demand, resulting in traditional engineering surveying
practitioners lacking the technology and equipment to complete such tasks.
Through continuous efforts over the past two decades, we have combined measure-
ment technology, sensor technology, computer technology, big data technology, and
artificial intelligence technology to form a comprehensive theoretical and technolog-
ical system of dynamic and precise engineering surveying. We continuously improve
this system in scientific research and engineering practice. In this process, my team
has developed its research characteristics, such as the combination of mapping tech-
nology and non-mapping industries, the integration of hardware systems and software
algorithms, and the fusion of equipment development and engineering applications.
In particular, a series of professional surveying equipment has been developed to meet
different application requirements, which has realized the innovation of engineering
surveying technology and the expansion of service fields, forming a new growth point
and direction for engineering surveying. We believe that for a considerable period in
the future, dynamic and precise engineering surveying will become a crucial fron-
tier of engineering surveying, showing rapid development driven by ever-emerging
application demands. At the same time, we also recognize that the application field
and scope of engineering surveying will continually expand. The boundaries between
it and other disciplines will be further blurred, leading to a broader and more inten-
sive integration with various industries and its essential role in multiple industries.
We anticipate that the innovation of technology and equipment, the intersection with
other disciplines, and the constant upgrade of professional equipment will facilitate
vigorous development and illustrate a bright future for engineering surveying.
The leading members of my team all participated in the writing of the Chinese
version of this book. I was responsible for all the chapters. Dr. Wei Tu contributed
to the writing of Chaps. 1 and 12 and was responsible for the organization and
coordination of the writing work. Dr. Zhipeng Chen contributed to the writing of
Chaps. 2, 9, and 10. Dr. Jianwei Yu contributed to the writing of Chap. 3. Dr. Chisheng
Wang contributed to the writing of Chap. 4. Prof. Dejin Zhang contributed to the
writing of Chaps. 5 and 6. Qingzhou Mao contributed to the writing of Chaps. 7
and 8. Dr. Baoding Zhou contributed to the writing of Chaps. 2 and 11. Additionally,
Dr. Wei Ma and Zhimin Xiong, Ph.D. students Minglei Guan, Yu Yin, Anbang Liang,
and Xu Fang, and research assistant Huiyan Cheng participated in data compilation.
I want to express my gratitude to them all.
Finally, I would like to extend my special thanks to my supervisor, Academi-
cian Deren Li, for his foreword to this book. I am grateful to my teacher during my
study at Wuhan Technical University of Surveying and Mapping and my colleague,
Prof. Zhenglu Zhang, who reviewed the entire book, made numerous valuable
corrections, and contributed Foreword II for this book. I would also like to thank
Prof. Bisheng Yang, Prof. Bijun Li, Prof. Luliang Tang, and Prof. Qingwu Hu for
their tremendous support.
xx Preface I

Due to the limitations of the authors’ expertise, there are inevitably typographical
errors, inaccuracies, and inappropriate statements in the book, and we sincerely invite
experts and readers to critique and correct them.

July 2020 Qingquan Li


Shenzhen University
Shenzhen, China
Preface II

A year has passed since completing the Chinese version of our book, entitled Dynamic
and Precise Engineering Survey. Time flies! The ongoing COVID-19 pandemic over
the past year has reduced many on-site academic activities. Perhaps it is a blessing
in disguise that the pandemic has provided my colleagues and me with more time
to sit down together, review our past work, and devise a plan for the future. Our
discussions culminated in a consensus to improve this book, enrich it with new
content, and rewrite it as an English version titled Dynamic and Precise Engineering
Surveying, which we are eager to share with the international community.
For over two decades, we have been working on dynamic and precise engineering
surveying, a hot topic and frontier in the field of engineering surveying. Our work
in this field began with deflection measurements of road surfaces at an early stage.
As we completed numerous projects and overcame various challenges, we gradually
developed a framework that integrates our surveying methods and devices to quickly
examine diverse infrastructures, including railways, bridges, tunnels, subways, and
airports. In 2012, when we moved to Shenzhen, we began extending the appli-
cation of our developed techniques to various fields, such as water conservancy
engineering, municipal engineering, and marine engineering. Throughout these
years, we have accumulated considerable practical experience and deepened our
understanding of our work. Dynamic and precise engineering surveying is a
typical field for multidisciplinary research and application, combining surveying
and mapping with information disciplines (e.g., electronic information and computer
information) and engineering disciplines (e.g., civil engineering and transportation
engineering). Nowadays, dynamic and precise engineering surveying increasingly
incorporates advanced information technology. For instance, professional surveying
equipment currently integrates various intelligent sensors and navigators, and it
requires dynamic and continuous multi-element big data collection and processing,
aided by innovations in computer vision, big data, and artificial intelligence. Simul-
taneously, the development of dynamic and precise engineering surveying is driven
by the growing demands of practical applications. From a global perspective, many
mega construction projects have accompanied China’s rapid urbanization over the
past few decades, such as the Three Gorges Dam, the national high-speed railway

xxi
xxii Preface II

network, the South-to-North Water Diversion Project, the Qinghai-Xizang Highway,


hundreds of across-river and across-sea bridges, and thousands of super high-rise
buildings. These megaprojects involve advanced construction technology, complex
construction processes and environments, and stringent operation and maintenance
requirements. These complex demands present numerous world-class problems and
challenges, propelling the development and evolution of dynamic and precise engi-
neering surveying and promoting the rapid progress and widespread application of
our research outcomes.
The Greater Bay Area, where our laboratory is situated, is one of the most econom-
ically active regions globally. It also ranks as one of the areas with the densest
engineering construction, the highest technological requirements, and substantial
investment. Construction activity in this region is among the most intensive in the
world, encompassing a large number of diverse infrastructure projects and large-
scale individual ventures. In recent years, investment in infrastructure construction
within this area has been almost equivalent to the total investment of other countries
over several decades. A single mega construction project can cost up to hundreds of
billions of yuan. The advanced technologies and techniques applied in these mega
construction projects are pioneering worldwide, as there are no prior references for
their construction. Notable innovations in mega construction projects include the
combination of bridge, island, and tunnel in the Hong Kong-Zhuhai-Macao Bridge,
the two-way eight-lane supra underwater tunnel of the Shen-Zhong Link, the under-
water interchange of the Yanjiang Expressway in Qianhai District of Shenzhen, the
50-m-deep tunnel pipeline in the Water Resources Allocation Project of the Pearl
River Delta, and the two-layer highway interchange with 16 lanes between Shenzhen
Airport in Baoan District and He’ao in Longgang District of Shenzhen. Confronted
with challenges and difficulties, we strive to provide solutions for such construction
projects and promote the development of dynamic and precise engineering surveying.
This book showcases our work on dynamic and precise engineering surveying
related to several mega construction projects in China. Specifically, it introduces
our innovative surveying equipment, which has been extensively applied domesti-
cally and exported globally, supporting local infrastructure construction, operation,
and maintenance. In the future, we aspire to make even greater contributions to
dynamic and precise surveying, provide superior products to countries worldwide,
and collaboratively promote global economic and social development.
During the preparation of this book, we completed several new research projects,
which we are delighted to include. Compared to the Chinese version, some content
has been modified or shortened, and new sections have been added, including
dynamic surveying in autonomous driving, optimized view photogrammetry, blue-
green laser water depth surveying, and more. This book owes a great deal to the
people I had the pleasure of working with over the past years. In addition to those who
participated in writing the Chinese version, I would like to acknowledge the following
new contributors: Dr. Long Chen and Dr. Kunhua Liu, who participated in writing
Chap. 3; Dr. Liang Zhang, who contributed to Chaps. 3 and 5; Dr. Wenshuai Yu, who
participated in writing Chap. 5; Dr. Wei Hu, who contributed to Chap. 2; Dr. Kai Guo
and Dr. Minglei Guan, who participated in writing Chap. 6; and Dr. Siting Xiong and
Preface II xxiii

Prof. Zhizhong Xu, who assisted with the translation and coordination of the book.
I extend my heartfelt gratitude to these kind and helpful colleagues.
Special thanks go to Prof. Bisheng Yang, Prof. Bijun Li, Prof. Luliang Tang, and
their team members at Wuhan University for providing related research achievements
and essential assistance during the preparation of this book.

January 2022 Qingquan Li


Shenzhen University
Shenzhen, China
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Dynamic and Precise Engineering Surveying . . . . . . . . . . . . . . . . . . . 1
1.1.1 Characteristics of Dynamic and Precise Engineering
Surveying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Research Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Surveying Modes and Technical Architecture . . . . . . . . . . . . . . . . . . . 7
1.2.1 Surveying Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Scientific Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Space and Time Datums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Time Datum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Space Datum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Principles for Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Integration of Surveying Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.1 Typical Sensors Used in Surveying . . . . . . . . . . . . . . . . . . . . . 21
1.4.2 Multi-sensor Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.4.3 Space and Time Association Between Multi-source
Surveying Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.5 Multi-source Surveying Data Processing . . . . . . . . . . . . . . . . . . . . . . . 32
1.5.1 Surveying Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.5.2 Framework of Surveying Data Processing . . . . . . . . . . . . . . . 35
1.5.3 Methods of Surveying Data Processing . . . . . . . . . . . . . . . . . . 36
1.5.4 Generalized Surveying Data Processing . . . . . . . . . . . . . . . . . 37
1.6 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2 Structural State Surveying for Transportation Infrastructure . . . . . . 45
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Road Transportation Infrastructure Surveying . . . . . . . . . . . . . . . . . . 46
2.2.1 Pavement Deflection Surveying . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2.2 Pavement Distress Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

xxv
xxvi Contents

2.3 Railway Transportation Infrastructure Surveying . . . . . . . . . . . . . . . . 114


2.3.1 High-Speed Rail Track Surveying . . . . . . . . . . . . . . . . . . . . . . 114
2.3.2 Subway Tunnel Surveying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
2.4 Bridge Dynamic Deflection Measurement . . . . . . . . . . . . . . . . . . . . . . 148
2.4.1 Principle of Vision Measurement . . . . . . . . . . . . . . . . . . . . . . . 149
2.4.2 Deflection Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
2.4.3 Dynamic Monitoring of Bridge Deflection . . . . . . . . . . . . . . . 153
2.5 Surveying Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
2.5.1 Systematic Architecture of the Surveying Equipment . . . . . . 156
2.5.2 Road Surveying Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2.5.3 Rail Track Surveying Equipment . . . . . . . . . . . . . . . . . . . . . . . 160
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
3 Dynamic Surveying in Autonomous Driving . . . . . . . . . . . . . . . . . . . . . . 165
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.2 Car Positioning and Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.2.1 GNSS/INS Integrated Positioning . . . . . . . . . . . . . . . . . . . . . . 166
3.2.2 In-Vehicle LiDAR Positioning . . . . . . . . . . . . . . . . . . . . . . . . . 172
3.2.3 In-Vehicle Visual Odometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
3.2.4 Multi-sensor Fusion Positioning . . . . . . . . . . . . . . . . . . . . . . . . 181
3.3 Object Detection in Autonomous Driving . . . . . . . . . . . . . . . . . . . . . . 184
3.3.1 2D Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
3.3.2 Vision-Based 3D Object Detection . . . . . . . . . . . . . . . . . . . . . 188
3.3.3 LiDAR-Based 3D Object Detection . . . . . . . . . . . . . . . . . . . . . 195
3.3.4 Vision and LiDAR Fusion Object Detection . . . . . . . . . . . . . . 199
3.4 High-Definition Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
3.4.1 HD Map Standard for Autonomous Driving . . . . . . . . . . . . . . 202
3.4.2 Production of the HD Map for Autonomous Driving . . . . . . 205
3.4.3 Applications of HD Map in Autonomous Driving . . . . . . . . . 209
3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
3.5.1 Application in Open-Pit Mines . . . . . . . . . . . . . . . . . . . . . . . . . 212
3.5.2 Application in Various Parks . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
4 Indoor and Underground Space Measurement . . . . . . . . . . . . . . . . . . . . 229
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
4.2 Indoor and Underground Space Positioning . . . . . . . . . . . . . . . . . . . . 230
4.2.1 Positioning Based on Smart Terminals . . . . . . . . . . . . . . . . . . 230
4.2.2 Positioning Based on a Precision INS . . . . . . . . . . . . . . . . . . . 245
4.3 Indoor 3D Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
4.3.1 Indoor Mobile 3D Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
4.3.2 Indoor Map Update Based on Crowdsourcing Data . . . . . . . 262
Contents xxvii

4.4 Flatness Detection of Super-Large Concrete Floor . . . . . . . . . . . . . . . 277


4.4.1 A Rapid Method of Aided-INS Floor Flatness
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
4.4.2 Testing and Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
4.5 Defect Inspection of Drainage Pipelines . . . . . . . . . . . . . . . . . . . . . . . 294
4.5.1 Drainage Pipeline Detection Method Based
on a Floating Capsule Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
4.5.2 The Test and Application of Drainage Pipe Network
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
4.6 Internal Deformation Measurement of Earth-Rockfill Dam . . . . . . . 315
4.6.1 Internal Deformation Monitoring for Earth-Rockfill
Dam via High-Precision Flexible Pipeline
Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
4.6.2 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
5 UAV 3D Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
5.2 LiDAR 3D Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
5.2.1 LiDAR 3D Measurement System . . . . . . . . . . . . . . . . . . . . . . . 336
5.2.2 Processing Method of LiDAR Point Cloud . . . . . . . . . . . . . . . 340
5.2.3 LiDAR 3D Measurement Applications . . . . . . . . . . . . . . . . . . 348
5.3 Optimized Views Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
5.3.1 View Optimization and Route Generation Method
Based on the Rough Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
5.3.2 Accuracy Analysis for Fine Real Scene Modeling . . . . . . . . . 378
5.3.3 Multi-UAV Collaboration in Optimized View
Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
5.3.4 Optimized Views Photogrammetry Applications . . . . . . . . . . 397
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
6 Coastal Zone Surveying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
6.2 Shipborne Water-Shore Integrated Surveying . . . . . . . . . . . . . . . . . . . 415
6.2.1 Water-Shore Integrated Surveying Technique . . . . . . . . . . . . . 415
6.2.2 Development of a Water-Shore Integrated
Measurement System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
6.2.3 Application of the Integrated Water-Shore
Measurement System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
6.3 Airborne Laser Bathymetric Surveying . . . . . . . . . . . . . . . . . . . . . . . . 430
6.3.1 Airborne Laser Bathymetry Technology . . . . . . . . . . . . . . . . . 430
6.3.2 Development of Airborne Laser Bathymetry
Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
xxviii Contents

6.3.3 Airborne Laser Bathymetry Data Processing . . . . . . . . . . . . . 438


6.3.4 Airborne LiDAR Bathymetry Application . . . . . . . . . . . . . . . 450
6.4 Coastal Surface Subsidence InSAR Measurement . . . . . . . . . . . . . . . 453
6.4.1 Research Status of InSAR Technology . . . . . . . . . . . . . . . . . . 453
6.4.2 Sequential InSAR Processing Technology . . . . . . . . . . . . . . . 455
6.4.3 The Interpretation of Sequential InSAR Results . . . . . . . . . . 459
6.4.4 Coastal InSAR Monitoring Application . . . . . . . . . . . . . . . . . 461
6.5 Coastal Tide Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
6.5.1 Research Status of Tidal Correction . . . . . . . . . . . . . . . . . . . . . 472
6.5.2 Spatial Structure of Ocean Dynamic Water Level . . . . . . . . . 474
6.5.3 Dynamic Water Level Correction Method . . . . . . . . . . . . . . . 476
6.5.4 Dynamic Water Level Correction in the Southwestern
Yellow Sea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
7 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Correction to: Dynamic and Precise Engineering Surveying . . . . . . . . . . . C1
Chapter 1
Introduction

1.1 Dynamic and Precise Engineering Surveying

Since the beginning of twenty-first century, engineering construction has advanced


by leaps and bounds, and the scale of construction has expanded rapidly. By 2020
in China, highway mileage has increased to 161,000 km, and high-speed railway
(HSR) mileage has increased to 38,000 km. Meanwhile, highway and railway
tunnels have exceeded 35,000, and hundreds of skyscrappers across-river and across-
sea bridges have been built. Operational conditions of these large-scale infrastruc-
tures are constantly changing, and when the change exceeds a certain limit, acci-
dents can occur, threatening human’s lives and property safety. High-precision and
high-frequency measurements of the geometric and physical properties of major
infrastructure are essential to accurately assess their operational conditions [1–3].
Traditional precision engineering surveying is generally carried out by carefully
laying out a planar control network or an elevation control network and placing
precise surveying instruments at several important locations to conduct measure-
ments. The whole process is of low automation but high costs, making it chal-
lenging to achieve full coverage, high efficiency, and high accuracy surveying [3].
For example, road deflection is an essential mechanical index indicates the bearing
capacity or structural strength of the road, reflecting its performance in service. It
is traditionally measured by using the Benkelman beam method. A certain load to
the road surface forces it to reach a full deformation state. Then, the road surface
rebound is measured after removing the load. The disadvantage of this method is that
the road needs to be closed during measurement, and the measuring speed reaches
only approximately 1–3 km/h. A year of continuous work with nearly 100 pieces
of equipments and 1,000 people is needed to measure the surface deflection of all
highway roads in China. It is still challenging to achieve spatially continuous and
comprehensive measure-ments [2]. Transferring the measurement in to a dynamic
and continuous way can improve the efficiency. High efficiency can also reduce the
close-down time, which is a great advantage for infrastructure maintenance, as they

© Science Press 2023, corrected publication 2024 1


Q. Li, Dynamic and Precise Engineering Surveying,
https://doi.org/10.1007/978-981-99-5942-6_1
2 1 Introduction

are usually in service almost all the day. For instance, high-speed trains operate for
most of the daytime, and only 2–4 h are left for railway track inspection. Compared
to static surveying with in situ deployed sensors along railway tracks [4, 5], which
requires thousands of people to work every night, it is much more efficient and afford-
able to develop a mobile platform that can carry on the measuring instruments and
travel along the HSR, especially in regard to surveying which is required to cover all
high-speed rail tracks seamlessly, i.e., the Wuhan to Guangzhou HSR in China.
Technologies related to sensors, computers, and robotics are developing rapidly,
and there is an evident trend toward multi-sensor integration and intelligence. Auto-
mated surveying robots can detect, measure, and monitor the target and collect
surveying data [6]. Light detection and ranging (LiDAR) systems can be measured in
fast, dynamic, and high-precision ways, achieving rapid dynamic and precise engi-
neering surveying of 3D coordinates. Integration of satellite navigation and posi-
tioning (global navigation satellite system, GNSS) and inertial measurement unit
(IMU) enables accurate determination of the position and attitude of the platform. On
the other hand, low-cost mobile platforms, including intelligent vehicles, unmanned
aerial vehicles (UAVs), autonomous ships, and mobile robots, are becoming increas-
ingly popular. This provides diverse available platforms for dynamic surveying
[6, 7]. Precise engineering surveying is developing in the direction of automa-
tion, dynamization, and intelligence. Dynamic and precise engineering surveying
methods are widely applied to long-span bridges, water conservancy hubs, highways,
high-speed railways, subways, etc. In addition, diverse applications need precise
surveying, such as aerospace, aviation, intelligent manufacturing, and scientific
research. Nowadays, the development of dynamic and precise engineering surveying
has entered a fast lane.
Dynamic and precise engineering surveying can be achieved either by using
existing specifically designed equipment or by developing professional equipment
that integrates existing sensors and platforms. It can obtain the position and attitude,
appearance change, and internal state of the surveyed object [8]. Currently, position
and attitude are measured using GNSS, IMU, distance measuring instrument (DMI),
laser tracker, and other sensors. Appearance change is mainly detected with the use of
optical sensors, such as cameras, LiDAR, structured light, etc. Internal state inspec-
tion is mainly realized using ground-penetrating radar, multi-beam sonar, pipeline
robots, etc. Professional equipment can quickly obtain multi-source geometric and
non-geometric data of surveyed objects. These data are processed to evaluate the
safety states of engineering structures during their operation and maintenance.
Dynamic and precise engineering surveying has been used in a wide range of engi-
neering applications, such as highways and railways, bridges and tunnels, and water
conservancy (Fig. 1.1).
1.1 Dynamic and Precise Engineering Surveying 3

Fig. 1.1 The framework of the dynamic and precise engineering surveying

1.1.1 Characteristics of Dynamic and Precise Engineering


Surveying

Unlike traditional engineering surveying methods, dynamic and precise engineering


surveying featured with high precision, high efficiency, complex surveyed objects,
and professional surveying equipment.
1. Requirement of high precision
According to the “Specifications of precise engineering surveying” (GB/T 15314-
94) [9], the absolute accuracy for precision engineering surveying should be at the
millimeter or even sub-millimeter level. For example, the mean squared error should
be no less than in the first-class precise distance measurement. According to the spec-
ifications, when building HSR with high safety and amenity, a high-precision plane
control network is established by using dual-frequency GNSS receivers. Surveying
related to engineering construction has a high standard for precision, usually above
the millimeter level. However, high-precision surveying in dynamic conditions is
still a great challenge since every stage can result in errors that should be care-
fully analyzed and controlled. Errors can be sourced from platform instability, signal
communication, and data processing.
4 1 Introduction

2. Complexity of surveyed objects


Traditional engineering surveying aims to measure small buildings occupying limited
floor area and spatial volume. Therefore, it is relatively easy. However, targets of
dynamic and precise engineering surveying are usually large-scale infrastructures,
such as high-speed roads, high-speed railways, skyscrapers, long bridges, and super-
long tunnels. These engineering structures span ample space and involve complex
structures that can consist of many components. Surveying oriented to these infras-
tructures is of strict requirements and highly difficult. Take the Wuhan-Guangzhou
HSR and the Hong Kong-Zhuhai-Macao underwater tunnel as two representative
examples. The former spans more than 1000 km, and across this large area, a CPIII
control network composed of 30,000 control points needs to be established. The
latter consists of 33 super large underwater tubes, spanning a total length of 5664 m.
The tubes weighing tens of thousands tons need to be connected at a depth of 40 m
underwater, and the surveying precision must be better than 2 cm to ensure their
successful connection [3].
3. Specialized surveying equipment
Precision engineering surveying usually uses of measuring instruments with higher
accuracy, such as high-precision theodolite with 0.5'' angle measuring accuracy,
high-precision laser rangefinder, high-precision laser collimation system, etc. These
instruments are of good performance, high accuracy, and high stability. In recent
years, precision engineering instruments have been elevated in measurement preci-
sion, range, automation, etc. Successively emerging devices, such as laser trackers,
laser scanners, surveying robots, and GNSS receivers, have provided technical guar-
antees for precision engineering surveying. Many engineering infrastructures are
located in a specific environment, posing high requirements for surveying work and
requiring professional dynamic and precise engineering surveying equipment devel-
opment. For instance, road surface inspection should reduce minimal interference
to road traffic. Therefore, rutting, flatness, cracks, damage, and other diseases of the
road surface need to be surveyed quickly and efficiently [5]. The solution is to develop
a vehicle-mounted road inspection system that can obtain fine 3D point clouds of the
road surface by combining structured light sensors, IMU, and GNSS. At the same
time, it captures road surface textures using high-resolution cameras.
4. Multidisciplinary integration
Dynamic and precise engineering surveying is a typical multidisciplinary research
field that integrates disciplines such as civil engineering technology, computer tech-
nology, electronic information technology, and automatic control technology. It is
also widely applied in fields such as architecture, geology, oceanography, mate-
rials science, and engineering, providing service for various applications, including
the construction of large-scale buildings, transport infrastructures, water conservancy
hubs, large-scale scientific devices, and lunar exploration projects. It is always the
frontier of engineering surveying, and is trending to be applied to underground,
1.1 Dynamic and Precise Engineering Surveying 5

underwater, and even space, more deeply cross-fertilized with civil engineering,
water conservancy engineering, aerospace, and other disciplines.
5. Requirement of high efficiency
Dynamic surveying is superior to traditional surveying, especially in terms of effi-
ciency. It is more applicable in regard to large-scale objects, inhospitable envi-
ronments, and limited time. Take high-speed railway fastener inspection as an
example. With dynamic surveying equipment, the inspection speed can be acceler-
ated to 60,000 fasteners per night compared to 600 fasteners per night completed by
manual inspection, which is labor-cost, imprecise, and unreliable. Drainage pipeline
surveying can reach 8–10 km per day when using the automated capsular robot,
compared to 1–2 km per day when using the remote robot working under control
through closed-circuit television (CCTV).
6. Multi-platform coordination
Single surveying equipment generally cannot achieve high-efficiency surveying of
large construction fields. Surveying efficiency can be largely improved by using
multiple equipments collaboratively working in a distributed mode. In this mode,
various instruments, such as robots, laser scanners, and ground-penetrating radar, can
collect data in the same space and time datums; hence, follow-up data processing and
analysis can be carried out with uniform datum. With this foundation, multi-source
data can be accurately and comprehensively interpreted with regard to the target
information of the surveyed objects.

1.1.2 Research Content

Compared with traditional engineering surveying, dynamic and precise engineering


surveying content is more general, expanding from spatial position and geometry to
various attributes, such as cracks, breakage, disturbance, and temperature change.
Both the data content and collection methods have fundamentally changed and
exceeded the traditional scope, especially in terms of data type, structure, quantity,
spatial and temporal resolution, geometric relationships, etc. Meanwhile, dynamic
and precise engineering surveying has played an essential role in diverse and expan-
sive applications, extending from general engineering construction to underground
engineering, marine engineering, underwater engineering, pipeline engineering,
manufacturing engineering, etc. The main research content of dynamic and precise
engineering surveying can be summarized in the following four aspects.
1. High-precision and multi-level space and time datums
Space data, such as multi-level survey control networks, has been constructed and
maintained according to specifications in engineering surveying. Dynamic surveying
requires not only space datum but also time datum. The challenge is establishing a
high-precision time datum and synchronizing the precise surveying data obtained in
6 1 Introduction

different time datums. It is still challenging to build a high-precision 3D reference


framework for engineering surveying and realize real-time dynamic surveying with
a uniform time datum.
2. Development and integration of new surveying sensors
Total stations, levels, trackers, etc., are frequently used equipment in traditional
engineering surveying. In dynamic and precise engineering surveying, surveying
parameters are not only limited to geometric shapes, which are usually measured
using IMUs, LiDARs, structure light cameras, infrared cameras, depth cameras,
multi-beam depth sounders, etc. In many scenarios, single sensors cannot meet the
requirement, and multiple sensors are integrated to achieve measurements of diverse
and various attributes of the surveyed objects.
3. Processing and analysis of multi-source surveying data
Since surveying sensors vary in range, precision, speed, density, and automation
level, the collected data are characterized by multi-source, heterogeneous, massive,
different Spatiotemporal scales, and different accuracies. The surveying errors do
not follow the normal distribution hypothesis, leading to the inability of classic data
processing methods. One challenge is big surveying data processing and analysis
oriented to dynamic and precise engineering surveying.
4. Development and application of professional surveying equipment
Surveying fields, environments, content, and objects have been largely changed
according to requirements proposed in different engineering scenarios. There is an
urgent and increasing demand for equipment to be developed for surveying diverse
structures. When developing professional surveying equipment, we should consider
sensors, synchronized control, and systematic calibration. In addition, the function-
ality, stability, reliability, and hardware and software architecture of the equipment
should be carefully designed and evaluated. Finally, seamless collaboration with
other equipment should be considered to achieve standardization, commoditization,
and industrialization.
In conclusion, dynamic and precise engineering surveying supports the efficient,
effective, and comprehensive inspection of large-scale engineering structures and
high-quality maintenance services. It also faces more challenges as engineering
construction advances, such as the increasing complexity of building structures
and environments. Meanwhile, the rapid development of sensor technology allows
requirements of high precision, reliability, and timeliness.
1.2 Surveying Modes and Technical Architecture 7

1.2 Surveying Modes and Technical Architecture

Dynamic and precise engineering surveying aims at precise engineering surveying


that involves moving surveying platforms or moving surveyed objects. The former
includes mobile vehicles, such as UAVs, vessels, robots, etc., along with onboard
sensors. The surveyed objects can be buildings, bridges, tunnels, roads, dams,
pipelines, etc.

1.2.1 Surveying Modes

In dynamic and precise engineering surveying, there are three modes involving either
the moving platform, the moving surveyed object, or both.
1. Dynamic objects monitored by static surveying platform
The mode, dynamic objects surveyed by a static platform, aims for a piece of fixed
equipment tracking or monitoring the position change or geometric change of the
surveyed objects. Figure 1.2 takes bridge deformation monitoring as an example.
The observation stations are set up around the bridge, with high-precision surveying
instruments monitoring the bridge deformation periodically or continuously. In addi-
tion, ground-based radars are usually fixed in one place to monitor the landslide,
and the obtained data can be assimilated into the model to predict its movement.
In building engineering, fixed sensors such as high-frequency cameras or videos
are usually installed on or around buildings to monitor the deformation caused by
typhoons.
2. Static objects surveyed by moving platform
This is a common dynamic and precise engineering surveying mode that involves
utilizing a moving platform (e.g., mobile vehicles, UAVs, autonomous ships) to

Fig. 1.2 Bridge deformation


monitoring
8 1 Introduction

Fig. 1.3 All-rounded road inspection equipment

observe the object and obtain its relevant information. As shown in Fig. 1.3, all-
round road inspection aims to detect road surface features, including rutting, cracks,
and other distress. The surveying equipment moving along the road achieves fast
and precise road inspection with denser sampling observations. In coastal surveying,
the aimed underwater terrain is static, while the sensors (e.g., positioning sensors,
sonar, laser scanning system, etc.) are onboard a vessel that is moving for rapid data
acquisition.

3. Moving objects surveyed by moving equipment

In this mode, the surveying sensors are installed on a moving platform, such as
vehicles, UAVs, etc. The sensors precisely measure absolute and relative changes in
the moving objects. For example, the movement of a high-voltage transmission line
can be monitored by LiDAR and a visual surveying system onboard the UAV. In the
installation of underwater tunnels, the movement of the tunnel section is monitored
by using the surveying instruments set up on ships to ensure the seamless connection
between tunnel sections. Another typical application is the multiple sensors installed
on the driverless vehicle to observe the surrounding vehicles, pedestrians, and traffic
lights to guide the vehicle’s safe driving (Fig. 1.4).

Fig. 1.4 Dynamic sensing


of vehicles and pedestrians
in automated driving
1.2 Surveying Modes and Technical Architecture 9

1.2.2 Scientific Questions

With the development of technology and the broadening of application fields, modern
engineering surveying faces three major challenges: variable scenes, diverse objects,
and multiple data sources. Complex and varying scenes pose difficult problems and
challenges to continuous and stable high-precision position attitude measurements
within a wide range and across scenes. This brings about one scientific question:
How to establish and maintain space and time datums for dynamic and precise engi-
neering surveying? Diverse target objects require the construction of object models
at different scales, which calls for the integration and fusion of multi-source data.
This leads to the second scientific question: How can multi-source, heterogeneous,
and multi-view data be integrated and interpreted?
To address these two scientific questions, we propose a theoretical framework
for dynamic and precise engineering surveying, which is concluded based on our
research and work in this field for the past decades. The framework consists of three
fundamental key technologies, namely, continuous and reliable attitude measure-
ment of mobile platforms in restricted and varying scenes, fast and accurate sensing
of multimodal object information across scales, and intelligent interpretation of
multiple indices by combining hierarchical models and a priori knowledge. The
theoretical framework consolidates the foundation, and clarifies the technical route
from measurement requirements to platform construction and from data acquisi-
tion to data processing, promoting the transformation of engineering surveying from
static to dynamic, from discrete to continuous, and from manual to intelligent.
1. Continuous and robust positioning in mobile and constrained scenes
Currently, the scope of infrastructure construction is no longer limited to the earth’s
surface in urban areas. It has been extended to high-altitude areas, remote moun-
tainous areas, and other inhospitable areas, even deep space and deepsea areas. Mean-
while, many infrastructures, such as roads, tunnels, and bridges, are crossing regions.
Their construction and maintenance involve complex and changing scenarios. In
varying and restricted scenes, it is difficult to carry out high-precision and stable atti-
tude measurements. To address this challenge, electro-optical-aided inertial naviga-
tion for robust positioning and scene control enhancement has been developed. It can
be used to achieve full-scene attitude measurement with moving platforms, and the
measurement is at millimeter-level accuracy, providing a high-precision continuous
space–time reference for the mobile platform.
2. Rapid sensing of multi-scale and multi-mode spatial information
In the past, the surveying and mapping discipline was dedicated to measuring and
mapping the geometric features and parameters of the target object. Currently, with
interdisciplinary and cross-disciplinary development, the task has expanded from
acquiring geometric parameters of a single object to acquiring multi-modal infor-
mation of multi-objects. Thanks to the development of surveying platforms, sensor
technologies, and even robotics, modern surveying is more often integrated with
multiple sensors with simultaneous control, acquiring multi-modal information of
10 1 Introduction

a single object or even multiple objects at one time in fieldwork. To this end, our
team invented a multi-sensor integrated synchronous control device for multi-object
multi-modal data acquisition, designed a unified architecture for multiple platforms,
realized efficient and high-precision sensing of multi-platform and multi-scale data,
and provided a unified paradigm for efficient cross-platform multi-sensor integration.
3. Multi-scale modeling and multi-index interpretation of surveyed object
The target objects and measuring platforms have been largely enriched, increasing
the diversity of surveying indices and scales. The ultimate goal of surveying activ-
ities is to obtain hierarchical information on target objects from multi-object and
multi-modal data. Currently, with the development of artificial intelligence, both
machine learning-based methods and deep learning-based data processing methods
have encountered insurmountable challenges. How to combine a priori knowledge
with measurement data to construct data- and knowledge-driven information extrac-
tion is an urgent problem in the field of artificial intelligence and engineering
surveying. In response to the data interpretation challenges caused by the variable
scale, complex structure, and diverse indices of infrastructure, we propose a data-
model-driven feature interpretation method to realize the intelligent interpretation of
surveying big data with multiple indices.

1.3 Space and Time Datums

Dynamic and precise engineering surveying is mainly accomplished with instruments


on a moving platform. Uniform space and time datums are needed to realize the high-
precision co-registration and fusion of multi-source data and ensure the usability of
the obtained dataset. Space and time datums are fundamental to obtaining an accurate
position and attitude of the platform, thus providing the reference for the dynamic
surveying of the sensors. It includes both space references and time references.

1.3.1 Time Datum

Time is an abstract concept, but it enables us to chronologically record the move-


ment of one object and attribute changing. In different cultures, people have devel-
oped different time systems, such as the Roman calendar, Julian calendar, Gregorian
calendar, and Chinese lunar calendar, to record their historical events. One time
system consists of two elements: a start time and time units that include year, month,
day, hour, minute, second, millisecond, microsecond, etc. Time is a significant param-
eter in dynamic and precise engineering surveying, while it is less prominent in
traditional static surveying. Combined with spatial reference, it describes the phys-
ical status of the surveyed objects. The following are several commonly used time
systems.
1.3 Space and Time Datums 11

1. Mean Solar Time


The mean solar time is defined as the time used by the Sun traveling across the
meridian line for two consecutive times, assuming it travels at an average speed. At
the end of the nineteenth century, one second was defined as 1/86400 of one mean
solar day.
2. Universal Time (UT)
Greenwich mean solar time is called universal time (UT). For one location on
the earth, the local mean solar time is defined using the local meridian, ms , and
it can be transformed to universal time by using the following equation.

λ
m s = UT + (1.1)
15
where λ is the local longitude.
For convenience, the earth is divided into 24 time zones according to the meridian,
and each time zone takes the mean solar time of the central meridian as the zone
time. Greenwich mean time (GMT) is the zero time zone, and Greenwich mean time
equals universal time; Beijing Time corresponds to the eighth time zone.

Beijing Time = UT + 8 h (1.2)

3. International Atomic Time (TAI)

Because of the inconsistency of universal time, in 1967, the 13th General Conference
on Weights and Measures decided to replace it with international atomic time and
introduce a new definition for the international standard time unit, second(s). The
atomic second was defined as the period for the cesium atom to transit between two
hyperfine structural energy states and radiated for 9,192,631,770 cycles. International
atomic time (TAI) is a homogeneous time system that specifies 00:00 UTC on January
1, 1958, as the initial epoch of international atomic time, with the atomic second as
the basic unit.
4. Coordinated Universal Time (UTC)
International atomic time cannot replace the time system based on the rotation of the
earth, which is more applicable in daily life. Coordinated universal time was proposed
to be compatible with both universal time and international atomic time. It gradually
became a standard time system that has been used in many Earth observation systems.
Coordinate universal time uses the same second unit as international atomic time,
but it is closer to time than universal time. It must be less than 0.9 s between UTC
and the universal time. To this end, UTC is forced to jump 1 s every the January 1 or
the July 1.
12 1 Introduction

5. Global Positioning System Time (GPST)


GPST is built based on TAI. The initial epoch is 00:00:00 on January 6, 1980 of
UTC. It is denoted in seconds, and there is no jump in the second accumulation. It
has a constant bias with TAI, which can be expressed in GPST = TAI−19 s. There is
an accumulated bias between GPST and UTC, as the latter has second jumps. GPST
is usually used with the unit of a week, which consists of 604,800 s and is called one
GPS week. It is denoted by using one number for whole weeks and one for seconds
less than a week.
6. BeiDou System Time (BDT)
BDT is also based on TAI. The initial epoch is 00:00:00 on the January 1, 2006. It
has no second jumps and uses one number for whole weeks and one for seconds less
than a week. BDT and GPST are based on the TAI but have different initial epochs,
so the transform between them is expressed in Eq. (1.3).

wBDT = wGPST − 1356


(1.3)
sBDT = sGPST − 14

In dynamic and precise engineering surveying, independent quartz clocks are


used by different sensors. The time datum should be of high-precision for time
synchronization to integrate various observations. Currently, there are the following
ways to obtain a high precision time datum.
A high-precision atomic clock can be used as a time reference because its
frequency is very stable. In some scenarios, it can be used directly as the time
datum and aligned directly with the National Time Service Center (NTSC) of the
Chinese Academy of Sciences (CAS). However, it is costly and requires periodic
time alignments.
Time datum can be set up using the pulse-per-second (PPS) provided by GNSS
satellite receivers. GNSS provides positioning data and GPST and outputs PPS,
which indicates the starting moment of time seconds. The accuracy can reach a
ten-nanosecond level, providing a high-precision time reference for other systems.
China’s BeiDou satellite positioning system can also achieve navigation and posi-
tioning. The BeiDou receiver outputs time information and time datum, and the
current accuracy can reach a ten-nanosecond level.
The time datum can be based on the time signal provided by the combination of
the GNSS/high-frequency receiver and crystal oscillator. Although atomic clocks
provide high-precision time datum, mobile surveying systems are generally not
equipped with atomic clocks because they are of exceptionally high cost. The use
of navigation satellites enables high-precision time synchronization at a relatively
low cost; however, time synchronization cannot be achieved in the case of GNSS
signal rejection. In addition, the time pulse signal output from GNSS receivers is of
1.3 Space and Time Datums 13

low frequency, which cannot meet the demand of high-frequency sampling for time
synchronization in mobile surveying.
A reference clock of high precision and high frequency can be generated by
combining a GNSS satellite receiver and a super-stable quartz clock. This combina-
tion offers a cost-effective and convenient design. When the GNSS satellite signal is
valid, the PPS pulse output from the GNSS receiver is used to adjust the square wave
pulse output from the highly stable quartz clock to obtain a high-precision absolute
time reference. In the case of GNSS signal rejection, the adjusted quartz clock is
used to output high-frequency time information.

1.3.2 Space Datum

A spatial datum comprises a series of reference points, lines, and planes for surveying
and mapping [8]. Given one spatial datum, one corresponding space coordinate
system can be constructed. For example, the geodetic coordinate system is defined
with a reference sphere as its datum. The height datum is defined with a given geoid
model. The coordinate reference system provides an origin, scale, axes, and a series
of time-varying parameters, calculations, and protocols. Datums provide the basis
for the definition. In dynamic and precise engineering surveying, data acquired by
various instruments are usually in different frames. Generally, the commonly used
reference frames are listed as follows.
1. Geodetic coordinate systems
(1) Geocentric inertial frame (i-frame). The classic Newton’s law of motion holds
only in inertial space. The inertial frame is the basis of motion calculation,
defined as stationary or uniformly linearly moving coordinate systems in inertial
space. Generally, the commonly used approximate inertial coordinate system is
the geocentric inertial coordinate system. The origin of this coordinate system
is the mass center of the earth. The Z-axis is the earth’s rotation axis, pointing
to the North Pole. The X-axis is in the plane of the mean equator, pointing to
the mean spring point. The X-, Y-, and Z-axes form a right-handed orthogonal
coordinate system. The geocentric inertial coordinate system is mainly used to
describe the motion of the earth itself and the inertial measurements in dynamic
and precise engineering surveying.
(2) Geocentric earth-fixed frame (e-frame). The geocentric earth-fixed coordinate
system, also called the earth coordinate system, is mainly used to define the
target position on the earth. Its origin is the mass center of Earth, and the Z-
axis is the rotation axis of the earth. The X-axis is the mean equatorial plane,
pointing to the prime meridian, and the Y-axis forms a right-handed orthog-
onal coordinate system with the Z- and X-axes. The earth’s coordinate system
rotates approximately at a uniform speed with respect to the geocentric iner-
tial coordinate system, i.e., the earth’s rotating rate of 7.2921158 × 10−5 rad/s.
In surveying and mapping, the earth coordinate system has been widely used,
among which the World Geodetic System 1984 (WGS84) is the most commonly
14 1 Introduction

used geodetic coordinate system. In this system, coordinates can be denoted


as right-angle spatial coordinates (XYZ three parameters) or geodetic coordi-
nates (longitude, latitude, and elevation parameters). At present, the interna-
tional terrestrial reference frame (ITRF) and the world geodetic system (WGS)
are the most commonly used geocentric and geodetic coordinate frames in the
world.
(3) Navigation frame (n-frame). The navigation coordinate system is also known as
a station-centered coordinate system in traditional surveying. The earth’s surface
is a 3D curve, and geometric operations on curves are very complex. Usually,
a local tangent plane of the earth’s surface is defined with a local horizontal
north-pointing coordinate system. Its origin coincides with the position of the
moving object. The X-axis points to the geographic east (E) direction, and the
Y-axis points to the geographic north (N) direction. The Z-axis points to the sky
(U) direction perpendicular to the reference ellipsoidal surface, which is usually
called the east-north-sky coordinate (E-N-U coordinate system) (Fig. 1.5). The
navigation coordinate system is mainly used to describe the attitude and velocity
of the moving object in the local reference frame.
(4) Mapping frame (m-frame). Final surveying results are usually shown in plani-
metric mapping coordinates. The commonly used reference frame for large-
scale mapping is the Gaussian projection coordinate system, which uses projec-
tion transformation to project the earth coordinate system according to a specific
projection formula. Its coordinate origin is the intersection of the central
meridian and the equator. The Y-axis points to the east, and the X-axis points to
the north, which forms a left-handed system with the Z-axis denoting the eleva-
tion direction. The earth’s ellipsoidal height is generally used for the Z-axis in
this system.

Fig. 1.5 Reference


coordinates used in
navigation
1.3 Space and Time Datums 15

2. Coordinate systems used in dynamic surveying

(1) Vehicle-fixed frame (v-frame). The vehicle-fixed frame is a frame fixed with the
moving vehicle platform, which is used to express the 3D position and attitude
of the moving platform, providing a reference frame for the sensors on the
platform. Generally, the right-handed coordinate frame is formed by taking the
geometric center of the platform as the origin, the Y-axis pointing to the moving
direction, the X-axis pointing to the right facing the moving direction, and the
Z-axis pointing to the sky. The vehicle-fixed frame is generally used when
vehicle frame measurements or a priori constraints are involved. For instance,
the DMI velocity measurements or vehicle motion velocity constraints need to
be expressed in the vehicle-fixed frame.
(2) Body frame (b-frame). The body frame applies explicitly to the carriers of the
inertial navigation system (INS). The origin is defined as the center of the inertial
measuring unit. The X-axis points to the right, and the Y-axis points forward.
The Z-axis points up and forms a right-handed orthogonal coordinate system
with the other two axes. It is referred to as the right-front-up coordinate system.
Rotation angles around the Z-X–Y axis are defined as heading, pitch, and cross
roll, respectively. The original observations of the INS are recorded in the body
frame (Fig. 1.6), which equals the vehicle-fixed frame when there is no INS.
(3) Sensor frame (s-frame). Measurements of the sensors are recorded in the sensor
frame. In a LiDAR system, the origin of the sensor frame is defined as the
center of the laser transmitter. The X-axis points to the 0° horizontal azimuth,
and the Y-axis points to the 0° vertical azimuth. In the following context, the
different sensor frames are abbreviated to their name initials, such as l-frame
for the LiDAR frame, d-frame for the DMI frame, and c-frame for the camera
frame.

It is worth noting that in data processing, variables are sometimes transformed


from one frame to another. Parameters describing this transformation are termed coor-
dinate transformation parameters. The underlying principles are classic and covered
in numerous textbooks, so they are not described in this book.

Fig. 1.6 Typical coordinate


systems used in dynamic
surveying
16 1 Introduction

1.3.3 Principles for Positioning

Position and attitude are two essential elements used to construct spatial references
for surveying and mapping. Various methods for positioning have been proposed and
studied in many fields, such as surveying and mapping, navigation, computer vision,
robotics, and aerospace. According to the positioning principle, dynamic positioning
techniques can be roughly divided into three categories: geometric intersection, scene
matching, and dead reckoning. According to the positioning results, the commonly
used positioning systems can be divided into 1D odometry, 2D planar positions, 2D
planar position and azimuth angle, 3D positions and one azimuth angle, and 3D
positions and 3D attitudes.
1. Geometric intersection
The geometric intersection method refers to the use of discrete distance and angle
measurements to calculate the target position using the principles of trigonom-
etry. It can be mainly divided into three categories: distance intersection, angular
intersection, and distance-angular intersection.
(1) Distance intersection. The distance intersection is generally realized by
measuring the travel time, phase, and intensity of either the laser, electromag-
netic, or sound waves to infer the distance between the target and base stations.
The 3D positions of the target can be calculated by using at least three distances
to the base stations. The theory of distance intersection is used in global naviga-
tion and positioning systems, ultrawideband (UWB) positioning systems, and
underwater acoustic array positioning systems.
(2) Angular intersection. The angular intersection is applied in mechanical code
wheel, radio goniometer, photogrammetry, etc., which measure the azimuth
angles of the lines between the target and base stations. The planar position
of the target can be calculated by using two azimuth angles and the baseline
between two base stations. A 3D position can be obtained with two additional
vertical angles. The angular intersection is commonly used in positioning with
total stations and binocular intersection positioning systems.
(3) Distance-angular intersection. The distance-angular intersection involves
measurements of both distances and azimuth angles. The target position is
calculated by using the principles of polar coordinates. The distance-angular
intersection is commonly used in distance-angular positioning systems, such as
radar systems and laser tracking systems.
The base stations are needed as the basic infrastructure in all three intersection
positioning methods, and high visibility between the target and the base stations
is needed. Therefore, intersection positioning methods are suitable in open fields,
and their positioning accuracy depends on the accuracy of distance and angle
measurements. The positioning accuracy is usually invariant to the measuring time
(Fig. 1.7).
1.3 Space and Time Datums 17

Fig. 1.7 Geometric intersection

2. Scene matching
Scene matching localization is used to locate the target by matching the local environ-
mental features to the overall environmental feature field or database, taking advan-
tage of the heterogeneity of spatial environmental features. Commonly used scene
matching methods include image map matching, point cloud map matching, Wi-Fi
fingerprinting location, geomagnetic field matching, map or topography matching,
etc. Several scene matching methods are introduced in the following.

(1) Electromagnetic matching. Geomagnetic anomalies caused by indoor steel


structures are used as reference features for localization. It is used to establish a
reference map of the indoor ambient magnetic field, and then localization is real-
ized by using algorithms such as particle filtering. The matching performance
is primarily related to the ambient magnetic field distribution. Mis-matching
occurs in regions where the ambient magnetic field is weak.
(2) Gravitational field matching. The gravity field on the earth’s surface has
different characteristics in different regions, which can be used to determine the
geographic location of the carrier. High-resolution and high-precision gravity
field measurements and gravity field reference maps are needed in this method.
The matching accuracy depends on the richness of gravity features.
(3) Wi-Fi fingerprint matching. Wi-Fi fingerprint matching is mainly used for indoor
localization, which is mainly divided into two stages. The first stage is to
construct a Wi-Fi fingerprint database, which collects the signals transmitted
by multiple base stations measured in a point-by-point or walking mode. The
second stage is fingerprint matching localization, which matches the received
signals with the signals recorded in the fingerprint database and determines the
signal transmitter’s locations. The disadvantage of Wi-Fi fingerprint matching
is that fingerprint database construction is time-consuming and labor-intensive.
Efficiency can be improved by updating the database automatically in a crowd-
sourcing way. However, in this way, the collected navigation data are less
reliable.
Scene matching localization requires the establishment of a basic database, such as
a field or map of the whole localization scene. In real-time localization, the perceived
local environmental signals are quickly matched with the information in the database.
18 1 Introduction

The similarity between signals is evaluated, and the reference signal with the highest
similarity is selected as the positioning result. Since only the local environmental field
needs to be sensed in real-time localization, scene matching is highly autonomous.
Periodic updates of the database are required to ensure localization effectiveness, as
the local environment field may be changing dynamically (Figs. 1.8).
3. Dead reckoning
Dead reckoning is a technique to achieve dynamic positioning with a known
initial position. The dynamic and continuous positioning of the target can be real-
ized by measuring its incremental displacement and attitude increment during its
moving path, as shown in Fig. 1.9. Due to the inevitable displacement and atti-
tude measurement errors, positioning errors accumulate during movement. There
are two commonly used dead reckoning methods: inertial navigation and visual/
LiDAR odometry.
1) Inertial navigation
INS is a navigation and positioning system with inertial sensors as the primary
sensors, including gyroscopes measuring the angular velocity of the carrier motion

Fig. 1.8 Scene/database matching for positioning


1.3 Space and Time Datums 19

Fig. 1.9 The principle of dead reckoning

and accelerometers measuring acceleration. INS can be divided into two categories,
i.e., platform INS and strapdown INS, according to the presence of a physical tracking
and stabilization platform. The latter contains three gyroscopes with orthogonal
axes and three accelerometers with orthogonal axes. The strapdown INS obtains
the carrier’s attitude by calculating the integral angular velocity. It maintains a stable
attitude reference, equivalent to a digital tracking platform. After projecting the
accelerometer measurements to the attitude reference system, the velocity is obtained
by integrating acceleration according to Newton’s law of motion. The position is
obtained by integrating the velocity, realizing the full navigation state measurement
of attitude, velocity, and position.
2) Visual odometry
Unlike inertial localization, which recurs motion through acceleration and angular
velocity, visual recursive localization recovers relative displacement and relative
rotation through the matching relationship between two frames. The movement of
targets between two frames can be inverted using multiple matched features. The
geometric relationship between two matching points is illustrated in Fig. 1.10.

(1) In the coplanar relationship, the two matched points form two rays with the
center of the camera and are back-projected into 3D space. The back-projected
rays intersect at P, and they are coplanar in a plane, π .
(2) Polar line constraint, the image point pr on the right view, which corresponds to
p1 on the left view, is located at the intersection line of the plane, π , and the right
view plane, forming a polar line, lr , on the right view, and vice versa. The line
connecting the two cameras is termed the baseline, and the intersection of the
baseline and the polar line is the kernel point, denoted by e1 and er . According
to the definition of the kernel line constraint, the image point P on the left view
is denoted as x, and the image point P on the right view is denoted as x ' . The
following equation can express the relationship between the basis matrix F and
the matched point pair [11].

x ' Fx = 0 (1.4)
20 1 Introduction

Fig. 1.10 Visual motion recursion principle

When the internal parameters of the left and right cameras are known, the basis
matrix F can be solved by using n (n = 8) pairs of matched points. The basis matrix
F can be decomposed to obtain the rotation and translation matrices representing a
transform from the right camera to the left camera.
Dead reckoning is more autonomous than the geometric intersection and scene
matching localization, especially using high-precision INS, which can achieve
frequent and high-precision self-localization over time. It can recover the camera
trajectory by simply using the observed sequential environmental images without
establishing a database in advance. It is particularly suitable for localization in
unknown or closed spaces, such as tunnels, pipelines, caves, and extraterrestrial
planets.
However, the shortcoming of dead reckoning is that the positioning error increases
over time until the positioning system fails. Therefore, continuous high-precision
positioning is highly dependent on the performance of the sensors. It is gener-
ally combined with intersection methods and scene matching approaches to achieve
satisfactory performance at a relatively low cost.
The performance of positioning or localization systems, such as accuracy,
frequency reliability, price, size, and weight, is highly diverse. Generally, high perfor-
mance and advantages in all these aspects cannot be met simultaneously; thus, there
are always trade-offs between these characteristics in different applications. The
above-mentioned systems are complementary in several characteristics, especially
precision [12]. For instance, geometric intersection and scene matching approaches
are generally more stable regarding positioning accuracy, but there is a possibility of
failures leading to discontinuous results. In contrast, the localization error of dead
1.4 Integration of Surveying Sensors 21

reckoning increases with time, but it has the advantage of continuous localization
with high accuracy for a short period. Therefore, combining localization techniques
using different principles can yield consistent and accurate localization.

1.4 Integration of Surveying Sensors

Dynamic and precise engineering surveying requires integrating various types of


sensors with appropriate carrier platforms and supporting systems, forming an inte-
grated system to rapidly acquire multi-source, multi-view, and multi-scale observa-
tions of multiple targets. The function and composition of the dynamic and precise
engineering surveying system are determined by various objective conditions, such
as the characteristics of the target, the measuring period, and the scene where the
object is located. Meanwhile, the reasonable selection of sensors is also the basis
of effective data acquisition. The cooperative work between multiple sensors, and
the space and time correlation between multi-source surveying data are the keys to
integrating multiple sensors.

1.4.1 Typical Sensors Used in Surveying

Sensors are the basic working units of a dynamic surveying system. Different from
traditional stationary surveying systems, dynamic surveying systems involve not
only the scenes but also the movement of the surveying system itself. Therefore,
sensors in a dynamic surveying system can be generally classified into two categories,
i.e., sensors for positioning and sensors for mapping the scene, according to their
different functions. The former is mainly used to establish the moving carrier’s spatial
reference frame. The latter is used to obtain the spatial information of the measured
target. A combination of the local scene measurement data and its spatial reference
can be used to generate unified and complete spatial data of the measured object.
Figure 1.11 shows an application of sensors for measuring the position and sensors
for mapping the scene in the dynamic surveying system.
1. Sensors for positioning
This category of sensors includes cooperative positioning sensors and independent
positioning sensors. The former refers to receiving signals from other systems, such
as GNSS, radio positioning systems, and ultrasonic positioning systems. The latter
refers to the one that realizes positioning with the data acquired by itself, such as
IMU, DMI, electronic compass, etc.
1) GNSS receiver
In a multi-sensor integrated dynamic surveying system, the GNSS module, consisting
of a GNSS antenna and receiver, has three functions: positioning, navigation, and
22 1 Introduction

Fig. 1.11 The system of dynamic and precise engineering surveying

timing (PNT). (1) positioning, i.e., taking advantage of the data acquired by the GNSS
receiver to realize the accurate positioning of the moving platform and to construct a
high-precision spatial reference frame; (2) navigation, i.e., providing position, speed,
and direction to vehicles, vessels, and human beings; and (3) timing, i.e., providing a
time system to sensors onboard the platform by using the time pulse from the GNSS
receivers.
2) IMU
The IMU consists of a triaxial accelerometer, triaxial gyroscope, digital sampling
circuit, and microprocessor. It can measure the triaxial angular velocity and triaxial
acceleration of the carrier. The commonly used IMUs can be divided into mechanical
gyroscopes, laser gyroscopes, fiber optic gyroscopes, and microelectromechanical
gyroscopes according to their different measurement principles.
Mechanical gyroscopes are based on angular momentum conservation theory.
When the rotor rotates at high speed, it tends to have a force to resist the rotation
direction. The rotor gyroscope maintains the direction and completes the measure-
ment with this characteristic. Production of rotor gyroscopes requires high-standard
and high-precision crafts, and their high cost limits their application in military
applications, such as submarines.
The laser gyroscope measures the angular velocity by using the optical range
difference of two light waves traveling along the same circular path in a reverse
direction. When the light source and the circle both rotate, the two lights travel for
different distances and form phase differences, through which the angular velocity
of the laser gyroscope can be measured. In recent years, laser gyroscopes have been
developed, and updated types tend to be smaller in size.
A fiber optic gyroscope is a type of optical gyroscope based on the Sagnac effect.
It consists of a light source, a light detector, a beam splitter, and a fiber optic coil.
When the gyroscope rotates along the axial direction perpendicular to the plane of the
1.4 Integration of Surveying Sensors 23

fiber coil, a phase difference proportional to the angular velocity is generated. The
angular velocity of rotation can be calculated with this phase difference. A fiber optic
gyroscope has the features of high reliability, long life, small volume, light mass, low
power consumption, good mechanical environment adaptability, extensive dynamic
range, good linearity, wide frequency band range, short start-up time, etc. It has been
widely used in aerospace, aviation, navigation, weapons, and various military and
civilian applications.
Based on microelectromechanical technology and inertial measurement princi-
ples, the microelectromechanical gyroscope integrates a three-axis gyroscope and
a three-axis accelerometer to realize a highly integrated unit, which significantly
reduces the size and cost of the strapdown INS. In low-precision strapdown INSs,
microelectromechanical gyroscopes are inexpensive, compact, and light, making
them the best choice for current UAV and indoor dynamic surveying systems.
However, for professional dynamic and precise engineering surveying systems, such
as vehicle-mounted, airborne, and ship-mounted systems, laser or fiber short-coupled
INS is still the mainstream.
3) DMI
The DMI can obtain the accurate speed of the carrier, which generally consists of a
photoelectric rotary encoder. The physical parameters, such as angular displacement
or angular velocity of the rotary axis, are converted into the electronic pulse output
through the photoelectric conversion circuit. The distance of the wheel moving rela-
tive to the ground is calculated using the photoelectric encoder’s pulse change within
the sampling period. The rotational motion of the carrier can also be analyzed by
installing the DMI at different positions.
The rotary encoder is installed at the center of the carrier wheel, co-axial with the
wheel. The distance traveled by the moving carrier within the pulse can be calculated
by labeling the pulse. The DMI has two prominent roles: (1) assisting positioning,
i.e., assisting the IMU in case of GNSS signal loss. (2) triggering other sensors, i.e.,
the pulse output from the odometer is used as a trigger signal for the digital camera
and other sensors to collect data according to a preset frequency.
2. Sensors for scene mapping
Mapping sensors in dynamic surveying are used to measure geometry, texture, and
other target features. Commonly used sensors include laser scanners, digital cameras,
depth cameras, etc.
1) Laser scanner
The laser scanner measures the distance and angle of the target by collecting the
returned laser signals from the scanned target, which form the point cloud of the
object’s surface. The point cloud contains spatial 3D coordinates and reflection inten-
sity, through which the 3D model of the object can be quickly constructed. According
to the different methods, laser scanners can be divided into 3D and 2D laser scanners.
24 1 Introduction

The 3D laser scanner mainly consists of a laser transmitter and receiver, vertical
and horizontal scanning drive motors, embedded computers, etc. The laser distance
measuring sensor emits a laser beam of sufficient energy to the target. The distance
between the sensor and the target can be calculated with the reflected laser beam
from the target. Meanwhile, the horizontal and vertical angles are obtained by the
angular sensor. With the distance and angular information, the 3D coordinates of a
single target point can be obtained. By mechanically rotating the prism to change
the laser beam direction, 3D scanning of the target is accomplished, and a 3D point
cloud is obtained.
For a single-line 2D laser scanner, a 3D point cloud is obtained by scanning the
target using a rotating prism with the movement of the carrier. The 2D laser scanner
scans in a single-line mode to form a sweeping plane; thus, it can only scan one plane
of the target during one rotational scanning cycle. The multi-line laser scanner has
been developed to achieve 3D scanning. The emission components are arranged into
a line array of a laser light source through which laser beams pointing in different
directions are formed in the vertical direction. Driven by a stepper motor, the laser
array rotates, the laser beam in the vertical plane transforms from a line to a plane,
and then multiple planes are formed when the laser array scans in rotating mode.
2) Digital camera
Digital cameras can obtain digital images of the target. In dynamic and precise
engineering surveying, the digital camera can be activated at certain time or space
intervals to collect attributes and environmental information about the target with
commands from the synchronous system. The obtained images contain rich spec-
tral, textural, and semantic information. CCD or CMOS photosensitive elements are
composed of image sensors that convert light signals into electrical signals. The
electrical signals are input to the analog-to-digital conversion (A/D converter) and
then digital processing and compression. The final output image is saved in the
computer storage media and displayed through the display device. According to
the wavelength, digital cameras can be categorized into visible light, infrared, and
multispectral/hyperspectral cameras.
(1) Visible light cameras are used to capture the true color of the target and obtain
the RGB color of the target. According to the structure of the imaging sensors,
digital cameras can be divided into line array cameras and plane array cameras.
Line array cameras are cameras with line array image sensors that form a length
of several thousand or tens of thousands of pixels and a width of several pixels.
Therefore, line array cameras are of extremely high resolution and fast image
acquisition, and they are suitable for the acquisition of high-resolution images
with large fields of view. Plane array cameras refer to the fact that light-sensitive
units are arranged in a matrix. The resolution of a plane array camera corresponds
to the size of the actual object represented by a pixel. The advantage of surface
array cameras is that they can obtain a 2D image directly with regularly arranged
light-sensitive units.
1.4 Integration of Surveying Sensors 25

(2) Infrared cameras use infrared radiation sensing technology to image objects. In
nature, any object whose own temperature is higher than the absolute thermo-
dynamic temperature (−273.15 °C) can emit electromagnetic radiation. There-
fore, infrared light emitted by the observed objects can be used with no external
light source. Compared to visible light, infrared imaging systems are func-
tional in fog, smoke, haze, and nighttime. It can also be used in many extreme
environments. Due to the absorption of radiation by the atmosphere, three atmo-
spheric windows, 0.5–2.5 µm, 3–5 µm, and 8–14 µm, are formed. According to
these different atmospheric windows, infrared imaging systems can be classified
into three categories, i.e., short-wave, medium-wave, and long-wave. Infrared
imaging has good image quality, strong anti-interference ability, and all-day
working ability.
(3) In a multispectral camera, the incident full-band or wide-band light is divided
into several narrow-band beams imaged on the photodetector separately. There-
fore, a single shot can obtain multiple images of different spectra. The obtained
image contains 2D spatial information and the spectral radiation information
of the target, according to which the composition of the imaged target can
be accurately distinguished. Generally, multispectral cameras can process the
visible, near-infrared, and ultraviolet bands separately, and their spectral reso-
lution is about 0.1 mm. There are generally only a few bands in the visible and
near-infrared ranges.
(4) Compared with multispectral cameras, hyperspectral cameras have a higher
spectral resolution. Their spectral resolution is generally about 0.01 mm, with
the number of bands ranging from 10 to 1000, much more than the multispectral
cameras. Hyperspectral cameras acquire two-dimensional spatial information
and continuous, narrow-band, and high spectral resolution images, forming a 3D
data structure termed a data cube. Hyperspectral cameras are characterized by
spatial identifiability, high spectral resolution, wide spectral range, and spectral
unity.

3) Depth camera

Depth cameras record the distance from each point in the image to the camera center.
Adding the distance information to the pixel coordinates of the plane image, the 3D
spatial coordinates of each point can be calculated, and the real scene can be restored,
realizing scene modeling, behavior recognition, and object recognition. According
to the imaging principle, depth cameras can be categorized into structured light depth
cameras, binocular vision depth cameras, and time of flight depth cameras.
(1) A structured light depth camera is an active detection device that uses a linear
structured light source to project the target and highlight its surface contour.
CMOS/CCD cameras acquire images containing distorted light strips, through
which the 3D geometric features of the target can be restored. It can only obtain
the profiling image of the measured target. Generally, the closer the camera is
to the target, the higher the measurement accuracy.
26 1 Introduction

(2) Binocular vision depth cameras mainly rely on binocular stereo vision matching
technology. The object is imaged by two ordinary cameras fixed at two different
positions. The distance between the target and the camera can be calculated
using position offsets between two sets of matched points. Although the depth
calculation algorithm is complex, it has the advantage of low-cost and high-
precision stereo observation, which can be acquired at a close range.
(3) The principle of the time of flight (ToF) depth camera is similar to laser ranging.
The camera transmits the laser pulse to the target and detects the returned signal.
The distance is calculated using the time interval of one transmitted and returned
laser pulse. ToF depth cameras can quickly obtain object information without
ancillary equipment or sensors. It has the advantage of being used for 3D imaging
of low-texture targets at medium and long distances.

4) Other sensors for surveying

(1) The robotic total station is an automatic total station that can automatically
search, track, identify, and accurately illuminate targets and perform angle
and distance measurements to complete the 3D coordinate calculation. The
measuring robot adopts automatic target recognition (ATR) technology to
achieve automatic identification, alignment, and measurement of the target
prism. The ATR components include an infrared laser emitter and a CCD
array detector. One infrared laser beam is emitted and reflected by the target
prism. The CCD detector detects the reflected laser, and the center position of
the target prism can be calculated by using the image processing algorithm to
guide the servo system for the accurate illumination of the prism. The measure-
ment robot can be driven by a motor and use ATR technology for automatic
identification and measurement of the target. On this basis, the fully auto-
matic measurement task can be further accomplished by developing a program,
which can be programmed by the user according to different scenarios and
different measurement needs, and controlling the measurement robot to realize
unattended automatic observation.
(2) The laser tracker is a high-precision measuring instrument that can track and
measure the moving target in real-time. The laser tracker mainly consists of a
laser interferometer, a photoelectric position detector, a tracking servo system,
and a target mirror, which can make the incident laser return to the tracker in
the form of parallel light in an original way. When the target moves, the laser
light incident on the target mirror will deviate from the center of the target
mirror, resulting in a translation deviation of the reflected laser light from the
target mirror, which is detected by the photoelectric position detector. After the
photoelectric position detector detects the target movement, it sends data to the
control system, which controls the servo system to track the target. Therefore, the
laser is always incident on the center of the target mirror to track the target. The
laser interferometer measurement accuracy can reach a submicrometer level.
The displacement of the target can be calculated in real-time through fusion
processing with the angle measurement provided by the servo system.
1.4 Integration of Surveying Sensors 27

(3) Millimeter-wave radar works in a millimeter-wave band. Its wavelength range


is 1–10 mm, and the corresponding frequency range is 30–300 GHz. The wave-
band is between the centimeter waveband and infrared waveband, combining
the advantages of both the centimeter and infrared wavebands. It can work in
smoke, dust, rain, fog, and all weather conditions and has an anti-jamming
ability. Therefore, it has wide applications in obstacle avoidance and detection.
Millimeter-wave radar generally consists of a signal transmitter, a receiver, a
signal processor, and an antenna. The transmitter sends an RF signal, which is
amplified by a power amplifier to a certain power level. The amplified signal is
transmitted by the antenna and coupled to the atmosphere. When the transmitted
signal encounters a target in the propagation, it is reflected and received by the
antenna. The receiver amplifies, denoises, and detects the signal to complete the
detection of the target.
(4) Sonar is the equipment for underwater target detection, localization, and iden-
tification using sound waves. The ultrasonic transducer is used to transmit and
receive sound waves. The transducer can convert the electrical signal into an
ultrasonic signal transmitted in the underwater environment. The ultrasonic
signal will return to the transducer when it encounters obstacles. The transducer
then converts the returned ultrasonic waves into electrical signals, which are
input to the signal processing system to detect underwater targets.

1.4.2 Multi-sensor Synchronization

The platform is in motion during the dynamic surveying task, and its reference
frame is constantly changing. It is necessary to convert the measurements in different
space and time datum into one to realize high-precision alignment and fusion of
multi-sensor data. The core idea of multi-sensor synchronization is to transfer a
high-precision time reference to each sensor, leading to a high-precision time tag
embedded in all acquired data.
1. Time synchronization
Typical sensors used in dynamic and precise engineering surveying have no inde-
pendent timing device and no interface for it, leading to their inability to synchronize
data. These sensors can be divided into two categories. One category receives external
control signals that trigger data sampling, such as a variety of line array and plane
array cameras, whose input signal contains the trigger pin. The other includes sensors
that directly output continuous signals (generally analog signals) through the A/D
data acquisition card, such as fiber optic gyroscopes and accelerometers.
According to the characteristics of the sensors, their synchronization can be
divided into three types: active synchronization, passive synchronization, and timing
synchronization.
28 1 Introduction

1) Active synchronization
Active synchronization means that the time synchronization control circuit actively
sends synchronization control signals to the sensors, and the control signal consists of
pulse trigger signals (level trigger, rising edge trigger, falling edge trigger, etc.). The
sensors record their high-precision time information. The sensors start acquiring data
after receiving the synchronization control signal. The collected multi-sensor data
are aligned with the synchronous time signal and are then sent to the data acquisition
computer, thus realizing synchronous data acquisition of multiple sensors. Therefore,
the significant feature of active synchronization is that the sensors can receive the
control signal from the synchronization controller with corresponding hardware and
software interfaces. The sensors with this function mainly include various types of
progressive scan plane array or line array CCD cameras or hyperspectral cameras,
laser rangefinders with external trigger functions, etc.
2) Passive synchronization

Passive synchronization means that the synchronization controller passively receives


the synchronization signals sent back from the sensors, records the high-precision
time information of the signal through internal hardware interruption, and sends the
time information to the data acquisition computer. The computer fuses and aligns the
multi-sensor data with the time information sent by the synchronization controller
through software, thus realizing synchronous data acquisition of multiple sensors.
Therefore, one primary feature of passive synchronization is that the sensors can
output pulse signals at the beginning or end of the data sampling. There are corre-
sponding hardware interfaces for this function. Sensors with this function mainly
include standard video signals, CCD cameras, DMI, some IMUs with synchronous
output functions, etc.
3) Timing synchronization
Timing synchronization means that the time synchronization controller sends only
the time data signal and PPS signal to the sensors rather than the synchronization
control pulse signal. The sensors do not send the synchronization pulse signal to the
synchronization controller but receive the time data signal and PPS signal internally.
The sensors directly align the acquired data with the high-precision time informa-
tion at the sampling moment and send it to the data acquisition computer. In other
words, the data output from the sensors contains high-precision synchronized time
information. These sensors are intelligent, as subsequent data fusion and alignment
are relatively simple but expensive due to their complex design and circuit. They
have GNSS timing functions and output data with UTC timestamps.
The multi-sensor synchronous control method is shown in Fig. 1.12.
1.4 Integration of Surveying Sensors 29

Fig. 1.12 Multi-sensor synchronous control method

2. Synchronization controllers

To realize the fusion and alignment of multi-source data in a dynamic and precise
engineering surveying system, data collection of multiple sensors should be estab-
lished at the same time axis, which is realized by a synchronization controller to
achieve the synchronization control and time recording of each sensor data. The
synchronization controller contains a specific logic control circuit to ensure func-
tional synergy and time synchronization between the sensors, as well as between the
sensors and the positioning system. Synchronization methods can be divided into
distance-triggered synchronization and time-triggered synchronization according to
the driving source that triggers multi-sensor data acquisition.
1) Distance-triggered synchronization
In distance-triggered synchronization, synchronized data collection of multiple
sensors is triggered by the set moving distance interval of the system carrier. In
this case, a high-precision distance sensor is needed to convert the moving distance
into an electrical pulse signal that can be sent to the synchronization control circuit for
counting. Data collection is triggered by certain counts of pulse signals according to
the set distance interval. The trigger signal is also transmitted to the central processor
of the synchronization control circuit. The high-precision time, position, and data
sequence number of the trigger signal or instruction are recorded through the inter-
ruption service program and then uploaded to the data acquisition computer through a
Gigabit network or serial interface. The data acquisition computer aligns the collected
data from sensors with the sampling moments and positions. Vehicle- or railway-
based dynamic and precise engineering surveying systems generally adopt distance
synchronization. The distance pulse signal is output from the optical encoder. The
sensors are driven by the distance pulse signal to realize data sampling at an equal
distance interval.
30 1 Introduction

2) Time-triggered synchronization
Time-triggered synchronization uses high-precision time as the driving source to
control the synchronized data acquisition of multiple sensors according to the set
time interval. The high-precision time reference clock obtains the time interval,
generally using the synchronization circuit to interrupt the complex programmable
logic device (CPLD) or field programmable gate array (FPGA) to count the time
reference pulses. When the set time interval is reached, the trigger command is
sent to the sensors or data acquisition devices to start data collection. The trigger
command is also transmitted to the central processor of the synchronization control
circuit (Fig. 1.13).
The trigger command’s synchronization data information, such as time, position,
and serial number, is also recorded in the data acquisition computer by interrupting
the service program. The data acquisition computer realizes the alignment of the
sensor acquisition data with the signals, such as sampling moments and positions.
Time-triggered synchronization is generally used in airborne or shipboard dynamic
surveying systems.

Fig. 1.13 Synchronization methods in the dynamic surveying system


1.4 Integration of Surveying Sensors 31

1.4.3 Space and Time Association Between Multi-source


Surveying Data

1. Calibration of multiple sensors

Generally, measurement output from the surveying system is biased from the truth
value caused by systematic and random errors. Model parameter errors or improper
models cause systematic errors. Eliminating or compensating for systematic error
is critical to improve measurement accuracy, so it is necessary to accurately model
the sensor and determine the parameters of the measurement model, i.e., sensor cali-
bration. Sensor calibration can be divided into field calibration and self-calibration
according to whether a calibration field is needed. Field calibration uses the surveying
system to observe known control features in the calibration field, and an accurate
estimation of the calibration parameters is solved by establishing an error equation.
Self-calibration uses the a priori constraint relationship between features to construct
an error equation and then to estimate the calibration parameters. Field calibration
is characterized by high accuracy, while self-calibration has the advantages of low
cost and convenience.
The geometric data acquired by the dynamic surveying system eventually needs
to be converted into a unified reference frame to form an overall data result. This
conversion is generally divided into two steps. First, according to the installation
parameters of all sensors, the original data are compensated for systematic errors
and converted from the individual sensor reference frame into a unified carrier fixed
frame. Then, the results are converted into a unified mapping frame with the position
and attitude of the carrier. Therefore, there are two levels of calibration. The first
level is the calibration of individual sensors, i.e., the systematic error of all sensors
is calibrated one by one. The second level is the calibration of the whole system, i.e.,
the installation error parameters between multiple sensors are calibrated.
2. Association of spatiotemporal data
The raw data collected by the scene measuring sensors on the carrier are all in the
sensor frame. At a certain moment, the data from multiple sensors can be converted
into a unified body frame. Since the carrier is constantly moving, it is necessary to
further obtain the position and attitude of the carrier in the mapping coordinate system
at each moment so that the sensor collected data can be converted from the body frame
to the unified mapping frame. In this process, it is necessary to correlate multi-source
data with different time frequencies, reference frames, and spatial resolutions to the
carrier attitude and position at the measuring moment provided by the positioning
and fixing system.
For instance, for the dynamic 3D laser scanning system, there are three kinds of
data in the laser scanning data fusion: the laser scanner placement parameters, the
laser scanning data, and the positional trajectory data.

x m = Rm
b (t)(R 1 x + l ) + r (t)
b 1 b m
(1.5)
32 1 Introduction

where Rb1 and l b are the rotation matrix and the rod wall value between the laser
scanner and the positioning attitude system obtained by calibration, respectively,
Rmb (t) and r (t) are the carrier attitude matrix and the position vector of the carrier
m

at moment t, respectively.
Since the sampling frequency of the scene measuring sensor generally differs
from that of the positioning sensor, the position of the sampled moments needs to be
obtained by interpolation of the trajectory points. Generally, linear interpolation can
be used, but it should be noted that since the attitude angle is a parameter distributed
on the manifold space, it can be more accurately interpolated in the tangent space.

1.5 Multi-source Surveying Data Processing

Surveying data are sourced from many scenarios and are featured as multi-source and
multi-scale big data. The core task of surveying data processing is to extract attributes
and parameters about the surveyed target. Data type, format, and errors are obviously
different from traditional data acquisitions. Therefore, apart from the classic least
square estimation, multiple methods, including machine learning and deep learning,
are researched and integrated into the dynamic and precise engineering surveying
data processing framework.

1.5.1 Surveying Data Type

With the rapid development of modern sensing technology, there are various
sensors. Meanwhile, the surveyed objects involve more complex structures, and their
surveying requirements constantly increasing. Therefore, surveying usually involves
various and diverse data that can be categorized into three main types, i.e., positions
and attitudes, point clouds, and visual images, as demonstrated in Fig. 1.14.
According to the surveyed object, positions and attitudes are attributes of the
platform, and point clouds and images contain information on both the object and
environment. The position, attitude, and point clouds reflect geometric properties,
while images reflect electromagnetic properties. The relationship between different
types of data is illustrated in Fig. 1.15.

1. Positions and attitudes

Position- and attitude-related measurements describe the spatial location of the


platform, including the position, attitude, velocity, angular velocity, etc. In this
context, the positions and attitudes illustrating the platform motion are derived by
integrating spatiotemporal correlated measurements of multiple sensors, including
GNSS, gyroscope, accelerometer, DMI, etc. The integration of spatiotemporal corre-
lated measurements is realized by compensating for the measurement errors of each
1.5 Multi-source Surveying Data Processing 33

Fig. 1.14 Three types of surveying data in real scenarios

Fig. 1.15 Different types of


data acquired in dynamic and
precise engineering
surveying

sensor and calculating the final optimal and accurate position and attitude data. The
general position and attitude data after the integration solution are shown in Table 1.1.
2. Point cloud
Point cloud data illustrate the geometry of the target, containing the 3D spatial coor-
dinates of the target. There are two general approaches to acquiring point clouds.
34 1 Introduction

Table 1.1 Data types of


Data name Units
positions and attitudes
Time s
X coordinate m or (°)
Y coordinate m or (°)
Height m
X velocity m/s
Y velocity m/s
Z velocity m/s
Roll rad
Pitch rad
Yaw rad
X acceleration m/s2
Y acceleration m/s2
Z acceleration m/s2
X angular speed rad/s2
Y angular speed rad/s2
Z angular speed rad/s2

One is the active mode in which the target is surveyed by the ranging instruments
set up on the moving platform. The 3D point cloud of the target can be calculated
by using the positions and attitudes of the platform and the range of the target. The
other approach is based on optical photogrammetry, and the visual 3D point cloud of
the object is obtained by using stereo image pairs. The former approach is of higher
precision and is commonly used in dynamic and precise engineering surveying to
obtain 3D information about the object. The data attributes of the laser point cloud
are listed in Table 1.2.

Table 1.2 Data attributes of


Data name Units
the laser point cloud
Time s
X coordinate m or (°)
Y coordinate m or (°)
Height m
Return echo intensity —
Times of echoes —
Echo numbers —
Scanning angle rad
1.5 Multi-source Surveying Data Processing 35

Table 1.3 Application scenarios of different types of visual cameras


Camera types Application scenarios
Grayscale camera Road crack detection
Near-infrared camera Night vision observation, water seepage detection
Infrared camera Fire detection, human temperature
Ultra-violet camera High-voltage live detection
Visible light camera Land cover classification, texture feature recognition
Hyperspectral camera Environmental monitoring, precise agriculture

3. Visual images

Visual images are obtained by 2D imaging devices on the moving platform.


According to the spectrum used, they are divided into grayscale, infrared, ultravi-
olet, visible, and hyperspectral images. The imaging sensors are selected according to
different scenarios. Typical application scenarios for various types of visual images
are shown in Table 1.3.

1.5.2 Framework of Surveying Data Processing

Multi-source surveying data processing is a procedure that transforms the original


data collected by the sensors into the required features of the object. This processing
faces the problems of multi-sensor data fusion, data denoising and enhancement,
data alignment and splicing, automatic identification of large-scene data features,
etc. Identifying and extracting target features oriented to engineering applications
needs to integrate intelligent analysis algorithms and adjust them for various dynamic
surveying data. The specific steps are shown in Fig. 1.16. First, the observation data
collected by multiple sensors are filtered and fused. The observation error can be
reduced by using redundant observations. Next, information enhancement removes
or suppresses uninterested targets, such as non-target points in point clouds and tree
shadows in images. Afterward, the data obtained after information enhancement are
matched and stitched, including matching and stitching data from different stations
and models. Finally, features or attributes of the target are identified and extracted,
such as road cracks and track defects.
36 1 Introduction

Fig. 1.16 The basic data processing flow of dynamic and precise engineering surveying

1.5.3 Methods of Surveying Data Processing

The objective of surveying data processing is to achieve an optimal solution of the


parameters in the model by reducing the observation error. Linear parameter esti-
mation is the classic data processing method. In mathematical statistics, an optimal
estimator has the characteristics of unbiasedness, consistency, and validity.
The least-squares criterion takes the minimum weighted sum of squares of the
estimated error vectors as the optimal estimation criterion. The parameters estimated
based on this criterion satisfy the optimal statistical properties and are thus widely
used in measurement data processing.
The least-squares criterion can only address random errors without considering
gross and systematic errors. The systematic error is limited by strict operating proce-
dures, and the gross error is reduced by geometric constraints and manual elimination
in classic surveying and data processing. To achieve digital surveying in different
scenarios, more efficient methods are needed to solve the systematic and gross errors
and to address insufficient observations and a priori information [13, 14]. Based on
Gauss-Markov models and least-squares estimation methods, a series of new theories
and methods have been developed, generally summarized in Fig. 1.17.
With the rapid development and invention of sensors, equipment, and platforms,
such as laser scanners, high-precision INS, and autonomous platforms, the amount
of surveying data grows rapidly and gradually forms a trend of contemporary big
surveying data. Compared with traditional surveying data, contemporary surveying
data are more dynamic, multi-sourced, and versatile, putting forward new data
processing method requirements. These requirements can be mainly concluded in
the following three points.
(1) In terms of data type, it has extended from point data, which is the main topic
in traditional surveying, to line (structure line), surface (image), and volume
(point cloud) data.
(2) In terms of temporal characteristics, it has evolved from a small number of
static observations and static parameter estimation to a large number of dynamic
observations and dynamic parameter estimation.
1.5 Multi-source Surveying Data Processing 37

Fig. 1.17 Data processing methods for modern surveying data

(3) In terms of surveying elements, it has progressed from traditional geometric


elements such as edges, angles, and heights to more specific geometric elements
and attributes that describe the object’s state (e.g., smoothness, deformation,
cracks, etc.) [15].

1.5.4 Generalized Surveying Data Processing

The theory and methods of traditional algebraic-based error adjustment cannot be


applied to dynamic data processing in complex scenarios. The research hotspot of
dynamic big surveying data processing is to adopt probabilistic statistics, optimiza-
tion theory, and numerous methods and techniques in computer vision and machine
learning in surveying data processing [16] to achieve intelligent analysis of surveying
data and cope with the requirements.
For processing geometric elements, probabilistic models in a Bayesian framework
are commonly used to fuse multi-source surveying data and to obtain geometric
parameters by maximizing the posterior probability of the parameters under the
observed conditions. In contrast to the classic least squares estimation model,
Bayesian theory provides a more precise language for describing probabilities and
uncertainties. The probability density distribution is not limited to a Gaussian distri-
bution; rather, it covers the classic least squares estimation. Therefore, Bayesian
theory is a better and more universal method. Filtering methods such as the extended
Kalman filter and particle filter are usually used to solve such models under dynamic
observations. If dynamic solutions are not required or if the problem itself is static,
38 1 Introduction

optimization methods such as graph optimization and Monte Carlo Markov sampling
can also be used to solve the problem.
For processing attributes, machine learning methods can be used, including super-
vised learning methods such as support vector machines, random forests, K-nearest
neighbor algorithms, and unsupervised learning methods such as K-means clustering.
These classic machine-learning methods generally use manually designed features.
They are used for point clouds, image semantic information understanding, and target
attribute extraction. In the last ten years, deep learning methods that can learn features
automatically have been rapidly developed [17]. Deep learning methods need many
samples to train models (Fig. 1.18), and feature representations are automatically
learned. The learned features consist of many parameters that can describe attributes
well. Therefore, deep learning has been gradually used in processing the attributes
of surveying data.
The current solution to obtain geometric parameters is using statistical infer-
ence methods based on rigorous mathematical models, and the solution to attribute
element extraction is utilizing machine learning methods based on sample data.
However, featuring variable scales, complex structures, and various elements, engi-
neering surveying faces challenges in data interpretation. For instance, the surveyed
objects can be kilometer-scale tunnels, decimeter-scale track fasteners, centimeter-
scale grooves, millimeter-scale cracks, etc. There is hardly a unified mathematical
model that can be applied to these various objects and obtain accurate measurements,
and the inaccuracy of the model will inevitably lead to deviations in the measure-
ments. In addition, the long-tail effect exists in the attribute extraction algorithm
that relies entirely on sample data, and the incompleteness of the sample has a great
impact on the final extraction results.

Fig. 1.18 DeepCrack convolutional neural network and its application to the detection of road
fissures
1.6 Application 39

Fig. 1.19 Intelligent interpretation of multiple indices combining hierarchical models and empir-
ical knowledge

To adaptively fit the variable scale of the surveyed objects and address the long-tail
effect of training sample data, a data-model driven feature interpretation method for
generalized engineering surveying data processing is proposed. This method solves
the problem of inaccurate attribute extraction caused by incomplete sampling by inte-
grating empirical knowledge and establishing a knowledge-based attribute database.
On the other hand, it addresses the problem of inaccurate single-scale modeling
by introducing hierarchical modeling, which is adaptively established according to
the scale of the surveyed object. With the integrated assistance of empirical knowl-
edge and hierarchical modeling, the data-model-driven intelligent interpretation of
multiple indicators in engineering measurement is finally realized.
Driven by the increasing demand for high intelligence and efficiency, dynamic
and precise engineering surveying in both theory and methods will be further devel-
oped with the development of artificial intelligence. On the one hand, more accurate
and fine measurement of the object is needed, calling for innovation in intelligent
interpretation of the indices. On the other hand, the surveying data will be contin-
uously expanded from the single index to the overall safety state of the surveyed
object. This calls for techniques for intelligent identification and evaluation of the
target state based on automatic measurement. For example, the safety state of roads,
bridges, and other infrastructure needs to be evaluated based on the identification
or extraction of multiple related indices that are calculated based on measuring data
using automatic sensors and platforms (Fig. 1.19).

1.6 Application

Engineering structures are becoming large-scale, complex, and diverse with China’s
rapid economic and social developments, which brings about more requirements for
surveying work to ensure the construction and maintenance of engineering struc-
tures. High-precision surveying instruments and professional surveying equipment
are used to quickly and efficiently obtain the geometric shape and change of the
surveyed objects. Dynamic and precise engineering surveying methods are featured
40 1 Introduction

Fig. 1.20 Typical applications of dynamic and precise engineering surveying

as dynamic, efficient, high-precision, and collaborative, providing services in the


whole life cycle of engineering structures. It has broader applications compared to
traditional surveying methods. Its application fields and scope are expanding contin-
uously. It has been widely applied in safety state surveying of transportation infras-
tructure, dynamic surveying of automated driving, indoor and underground space
surveying, UAV 3D surveying, coastal zone surveying, etc (Fig. 1.20).

1. Transport infrastructure safety surveying

Transport infrastructure is essential for the stability and development of the national
economy, as transportation is essential to the mobility of humans and goods. The
transport infrastructure includes roads, railways, airports, subways, bridges, tunnels,
bays, and ancillary facilities. The efficient and stable operation of transport infrastruc-
tures is directly related to social stability, people’s lives, and property safety. Dynamic
and precise engineering surveying assists in detecting and monitoring of the transport
infrastructure during its operation and maintenance stages, providing support for safe
service and enhancing service performance. It has been applied in transport applica-
tions, such as the measurement of road surface bending, rutting, flatness, breakage,
and cracks and the detection of track stiffness, damage, abrasion, and fasteners;
measurement of airport road surface; measurement of deformation, water seepage,
cracks and hollow drums of tunnels; health monitoring of bridge engineering, etc.
Surveying work can be carried out using professional surveying equipment consisting
of a mobile platform and various sensors. According to different applications, sensors
can be GNSS receivers, IMUs, structural light measurement sensors, Doppler laser
velocimeters, laser scanners, high-resolution grayscale cameras, infrared cameras,
1.6 Application 41

accelerometers, DMIs, etc. Among the equipment, dynamic surveying systems for
road surface bending, inspection vehicles for highway tunnels, and track surveying
trolleys are typical representatives.
2. Dynamic surveying for automated driving
Automated driving is facilitated by developing intelligent vehicles, telematics, and
sensor technology. In automated driving, various sensors, such as IMU, LiDAR, and
HD cameras, are used to efficiently sense the surrounding environment. Machine
learning or deep learning is used to quickly identify surrounding objects and obsta-
cles, such as pedestrians, vehicles, road bumps, and sign markers. Vehicle path plan-
ning and control can be conducted based on the obtained data for obstacle avoidance
and stable driving. Currently, logistics robots can sense the environment around
ports and terminals based on various sensors on mobile platforms. Accurate posi-
tioning and high-resolution mapping are achieved using high-resolution cameras or
low-cost radars. The robots achieve cargo selection, delivery, and stacking at desig-
nated locations based on the maps and navigation. Automated excavation, intelli-
gent driving, and autonomous operation have been realized in intelligent mines by
installing sensing, positioning, and control systems on various vehicles. Dynamic
monitoring of high-rise buildings, large landslides, urban subsidence, and subway
superstructure subsidence has been achieved using autonomous intelligent systems
that integrate LiDAR, high-resolution cameras, ground-penetrating radar, and other
sensors, guaranteeing public safety.
3. Indoor and underground space surveying
Urban complexes, pipe networks, and underground construction are necessary infras-
tructures for urban development and operation. Collaborative surveying and 3D
modeling of large indoor public buildings are realized using multiple indoor 3D
measurement robots with integrated UWB, LiDAR, and cameras, providing primary
data for smart city location information services. Rapid localization and efficient
detection of diseases in urban drainage networks has been achieved through big
data analysis. The analyzed data are images of internal drainage pipes and capsule
motion trajectory data collected using multiple indoor 3D measurement robots with
integrated UWB, LiDAR, and cameras. This technology provides a basis for pipe
network repair and improves the efficiency of urban operation and management.
Measuring the internal deformation of rockfill dams has been a complex problem
for a long time. Line-shaped flexible pipes are pre-buried in rockfill dams to solve
this problem. A robot with high-precision IMU/DMI is employed to measure the
3D curve of pre-buried pipes, based on which the internal horizontal and vertical
displacements of the rockfill dams can be calculated. Through this processing, precise
internal deformation measurements of large rockfill dams can be realized.
4. 3D surveying with UAV
With the acceleration of urbanization, numerous high-rise buildings and municipal
facilities have been rapidly built. Dynamic and precise engineering surveying is
widely used in all stages of urban construction projects. Rapidly acquiring and
42 1 Introduction

updating high-precision 3D urban models are realized by using UAVs and airships
as platforms, integrating sensors such as GNSS receivers and IMUs, high-resolution
hyperspectral cameras, tilt photography cameras, and LiDAR. Fine modeling of
urban structures, such as buildings, bridges, and venues, can be achieved by designing
the best flight route for UAVs, thus realizing rapid change monitoring of these
structures.
5. Coastal surveying
Coastal zones, including nearshore islands, are affected by sea-land interactions and
human activities and have exceptional environmental characteristics and dynamic
changes. An integrated water and shore 3D surveying system was constructed inte-
grating a GNSS receiver, IMU, LiDAR, multibeam bathymetry, and underwater laser
surveying equipment. This system can conduct 3D observation of coastal zones,
slope monitoring of dams and reservoirs, 3D surveying of channels and tunnels, and
damage and crack detection of underwater engineering. Airborne laser bathymetry
(ALB) is a new type of marine exploration equipment that consists of a laser emis-
sion unit, full waveform echo detection and recording unit, optical scanning device,
GNSS/IMU combined positioning, and related auxiliary devices. It can be effectively
used to obtain 3D underwater terrain and detect underwater targets in nearshore
shallow water areas. The ALB is efficient, operational, and flexible in acquiring
topographic and geomorphological information with high accuracy and coverage.
Therefore, it has been widely used in coastal zones, beaches, islands, reefs, and
shallow coastal waters.

1.7 Summary

Dynamic and precise engineering surveying represents the interdisciplinary frontier


of engineering, sensors, and computer sciences. There are difficulties in establishing
and maintaining space and time datums, integrating and controlling multiple sensors
and instruments, fusing and analyzing multi-source data, etc. Surveying work faces
the challenges of high-precision requirements, complex surveyed objects, profes-
sional equipment development, mastery of multidisciplinary knowledge, and coor-
dination of multiple platforms. By overcoming these difficulties and challenges,
dynamic and precise engineering surveying has been widely applied in safety state
surveying of transportation infrastructure, dynamic surveying of automated driving,
indoor and underground space surveying, UAV 3D surveying, and coastal zone
surveying.
References 43

References

1. Song C, Chen H, Wen Z (2019) Technology innovation and development of engineering


measurement in China. Beijing: China Construction Industry Press.
2. He H (2010) Innovation and development of high-speed railroad in China. China railway
(12):5–8.
3. Li G, Fan B (2017) Precision engineering measurement technology and its development. Acta
Geod et Cartogr Sin 46(10):1742–1751.
4. Li Q, Mao Q (2017) Advances in dynamic precision measurement of roads/track. Acta Geod
et Cartogr Sin 46(10):1734–1741.
5. Zhang D, Li Q (2015) A review on the development of rapid inspection technology for highway
pavement. Journal of Geomatics 40(1):1–8.
6. Li Q, Chen L, Li M, et al (2013) A sensor-fusion drivable-region and lane-detection system
for autonomous vehicle navigation in challenging road scenarios. IEEE Trans Veh Technol
63(2):540–555.
7. Li Q, Zhang D, Wang C, et al (2021) Dynamic precision engineering measurement techniques
and applications. Acta Geod et Cartogr Sin 50(9):1147–1158.
8. Kong X, Guo J, Liu Z (2010) Fundamentals of geodesy. Wuhan: Wuhan University Press.
9. State Bureau of Surveying and Mapping (1994) Specifications for precise engineering survey
(GB/T15314-94). Beijing: Standards Press of China.
10. Pomerleau F, Colas F, Siegwart R (2015) A review of point cloud registration algorithms for
mobile robotics. Found Trends Robot 4(1):1–104.
11. Groves P D (2015) Principles of GNSS, inertial, and multisensor integrated navigation systems.
IEEE Aerosp Electron Syst Mag 39.
12. Shin E-H (2005) Estimation techniques for low-cost inertial navigation. UCGE report.
Dissertation, Calgary: Calgary University.
13. Kalman R E (1960) A new approach to linear filtering and prediction problems. Trans ASME
- J Basic Eng 82:35–45.
14. Li D (1984) Coarse differenc localization using option iteration method. Geomatics and
Information Science of Wuhan University 9(1):51–73.
15. Li Q, Zou Q, Zhang D, et al (2011) Fosa: F* seed-growing approach for crack-line detection
from pavement images. Image Vision Comput 29(12):861–872.
16. Wang C, Shu Q, Wang X et al (2019) A random forest classifier based on pixel comparison
features for urban LiDAR data. ISPRS J Photogramm Remote Sens 148:75–86.
17. Zou Q, Zhang Z, Li Q, et al (2018) Deepcrack: Learning hierarchical convolutional features
for crack detection. IEEE Trans Image Process 28(3):1498–1512.
Chapter 2
Structural State Surveying
for Transportation Infrastructure

2.1 Overview

Transportation infrastructure is the cornerstone of economic development for every


country in the world, and these infrastructures need to be appropriately designed
and constructed according to the country’s development level. China is a developing
country with a large population. In recent decades, nearly 5,200,000 km of roads and
146,000 km of railways have been built in China. It needs to be noted that China has
built 38,000 km of high-speed railways, ranking as the longest in the world. In the
United States, 225,000 km of railways rank first in the world, but high-speed railways
have not been developed, complemented by a well-developed aviation industry.
To date, transportation infrastructure in every country has reached a certain scale,
especially in developed regions. The safe operation of infrastructure has become the
focus of transportation management departments. Hazards or dangers can occur due
to material aging, improper construction, and climate degradation of the infrastruc-
ture. For instance, asphalt pavement may suffer fatigue, such as cracks, tracing ruts,
potholes, subsidence, ravelings, and lumps, leading to structural strength changes in
the pavement. Highway and railway bridges may suffer cracks and large dynamic
deflection. Structural deformation, lining cracks, water leakage, and freezing may
occur on the tunnel cross-section and wall surface. Rail track fatigue, such as cross-
section changes, rail abrasion, tread cracks, and loose fasteners, needs to be moni-
tored for railway safety. All these fatigues, defects, and even damages threaten the
safe operation of the infrastructure. Timely detection and repair of these problems is
significant to ensure the safety of various facilities in service. In research institutions
around the world, scientists and engineers have conducted in-depth research on the
state surveying of infrastructure, including roads, bridges, tunnels, and railways. For
example, Fugro (Canada), ARRB (The Australian Road Research Board), WayLink
Systems Corporation (USA), Wuhan Optics Valley ZOYON Science and Technology
Co., Ltd. (China), Wuhan HiRai Transportation Technology Co., Ltd, and Shenzhen
University have developed their own professional surveying equipment, which are
widely used in the road transportation and rail transportation industries.

© Science Press 2023, corrected publication 2024 45


Q. Li, Dynamic and Precise Engineering Surveying,
https://doi.org/10.1007/978-981-99-5942-6_2
46 2 Structural State Surveying for Transportation Infrastructure

2.2 Road Transportation Infrastructure Surveying

2.2.1 Pavement Deflection Surveying

Pavement deflection is an important factor to be considered when determining road


maintenance strategy. It is the vertical deflection or rebound value at the wheel loading
position under fully deformed conditions, which is usually measured after removing
the load, as shown in Fig. 2.1. Deflection is a critical mechanical parameter for
characterizing the overall strength of pavement, the overall strength and stiffness of
structural layers, and the operational performance of pavement. It is also an important
indicator for the design, acceptance, and maintenance of the road.
The Beckelman beam deflection (BBD) method has been adopted for measuring
pavement deflection since the 1960s. Since then, non-destructive testing (NDT)
instruments have been widely used to measure the structural strength or loading
capacity of existing and new roads. Meanwhile, a variety of similar instruments have
been developed and used in measuring pavement deflection. All of these instruments
are based on the same principle, which is measuring the pavement deflection after
applying a certain load to the pavement. The BBD method is based on the principle
of leverage, which is that after applying a certain load force to the pavement, the
static deflection is calculated by measuring the pavement rebound since the pave-
ment is fully deformed after the load is removed. According to the principle, only the
maximum static deflection on a single point can be obtained with the BDD method,
and it is primarily affected by the monitoring scenario. It has been gradually replaced
by falling weight deflectometers (FWD) in many countries. The pavement deflection
is measured in the FWD using the instantaneous load generated by a free-falling
heavy hammer. FWD measures the deflection value at multiple points located on the
deflection bowl, and each measurement is carried out several times after sampling. It
can measure the maximum deflection at the individual points and the entire deflection
bowl. The work requires the equipment to stay on the measurement point for 2–4 min,
with an interval of 20–50 m and a measurement speed of 3–5 km/h. The measurement
is the dynamic response of the pavement. Obviously, the above-mentioned traditional
equipment can only obtain measurements at discrete points with a large sampling
interval. The workload is heavy, and traffic control is required during the measure-
ment. The results, therefore, cannot reflect the dynamic characteristics of the actual

Fig. 2.1 Definition of pavement deflection


2.2 Road Transportation Infrastructure Surveying 47

Fig. 2.2 Simplified model of pavement forces

traffic load on the road and cannot meet the requirement of the general investigation
of roads.
In the 1990s, the United States, Sweden, the United Kingdom, Denmark, and other
countries developed prototype systems for monitoring dynamic pavement deflection.
There are two technical routes, the first of which is a direct load to displacement
method, which measures the deflection of the pavement under loading and then
calculates the deflection of the pavement, including the ARA rolling wheel deflec-
tometer (ARA RWD), the quest airfield rolling weight deflectometer, Quest/Dynatest
RWD (ARWD), road deflection tester (RDT), etc. The second is an indirect method of
load-velocity-displacement, and the pavement deflection is inverted by calculating
the speed of the deformed pavement under dynamic loading, including the traffic
speed deflectometer (TSD) from Greenwood Engineering A/S in Denmark and the
laser dynamic deflectometer (LSD) from Wuhan Optics Valley ZOYON in China.
The second method is mainly focused on in the following text.
1. Measurement model

Deflection is calculated by measuring the amount of deflection of the pavement after


it is fully deformed caused by the load. Linear structures, such as roads and railways,
can be abstracted as infinite-length beam structures on elastic foundations in civil
engineering. The deflection equation of different beam structures can be derived.
Then, the measurement can be modeled as the deflection of the beam under the
action of force. Let us assume that the pavement meets the elastic beam condition
[1].
The mechanical force diagram is illustrated in Fig. 2.2. When the elastic founda-
tion beam is subjected to a load reaction force, the reaction force at the bottom of
the beam is p(x). The vertical displacement of the beam and foundation is y(x), and
dx is taken in the distributed load section as shown in Fig. 2.3.
Considering the balance of the segment unit:

V − (V − dV ) + p(x)dx − q(x)dx = 0
(2.1)
dv
dx
= p(x) − q(x)

Based on V = dM/dx, the above equation can be written as


48 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.3 Deflection curve model of the elastic foundation beam

d2 M
= p(x) − q(x) (2.2)
dx 2

d2 y
EI = −M (2.3)
dx 2
Take the derivative of Eq. (2.2) twice and put it into Eq. (2.3). We have

d4 y
EI = − p(x) + q(x) (2.4)
dx 4
The deflection curve differential equation [2] of the deformed pavement curve
can be derived as:

d4 w(x)
EI + kw(x) = −Fδ(x) (2.5)
dx 4
where F is the load, k is the coefficient of the subgrade bed (reaction modulus), E is
the stiffness modulus of the pavement, x is the horizontal position of the deflection
bowl, I is the moment of inertia of the pavement, and w is the deflection.
The general equation of the deflection bowl curve of the pavement can be derived
from the deflection differential equation. The deflection equation w(x) is a fourth-
order nonlinear constant coefficient derivative equation. To solve it, consider the
unloaded part first, as shown in Eq. (2.6).

d4 w(x)
EI = −kw(x) (2.6)
dx 4
( )1/4
Let β = 4Ek I . With the Winkler formula, the general solution can be expressed
as Eq. (2.7).

w(x) = eβ x [C1 cos(βx) + C2 sin(β x)] + e−βx [C3 cos(β x) + C4 sin(βx)]. (2.7)
2.2 Road Transportation Infrastructure Surveying 49

The above equation expresses the general solution of an infinite beam subjected
to a centralized lateral load on an elastic foundation, where C 1 , C 2 , C 3 , and C 4 are
integral constants determined by the load and boundary conditions. For an infinite
beam on the elastic foundation, the deflection is close to zero [3] at the position x.
Therefore, in Eq. (2.7), C 1 and C 2 are equal to 0. Then, we have

w(x) = e−β x [C3 cos(βx) + C4 sin(βx)] (2.8)

Considering an infinite beam on an elastic foundation, if a load F is applied on a


certain point, the deflection slope is 0 at the load point due to the symmetry of the
deflection; then, Eq. (2.9) is obtained.
⎧ dy
⎨ dx = 0, (x = 0)

(2.9)
⎩ kw(x)d x = F
2
,x /= 0
0

where C3 = C4 = Fβ
2k
.
The conventional method for solving deflection curves is

Fβ −β x
w(x) = e [cos(βx) + sin(βx)] (2.10)
2k
( )
k 1/4
where β = .
4E I
( )1/4
Let A = √ F
4E I k
and B = 4Ek I .
The solution w(x) of the deflection bowl curve can be obtained. The first derivative
of the curve can also be derived, that is, the slope of the curve at a certain point and
the maximum deflection can be expressed as d ' (x) and d(0), respectively. Since the
slope is the ratio of the pavement deflection velocity V di , to the driving speed V ds ,
set values of A and B can be obtained from two speedometers. According to the
values of A and B, the maximum deflection can be calculated. The deflection values
at different positions can then be obtained with multiple sets of speedometers. The
deflection bowl curve can be drawn using the measured deflection at multiple points.

⎨ w(x) = − 2B e−Bx [cos(Bx) + sin(Bx)]
A

d (x) = Asin(Bx)e−Bx
' (2.11)

d(0) = − 2B
A

The complete deflection-related equations are shown in Table 2.1.


In this table, x > 0, A > 0, B > 0, and the model is the deformation of the entire
deflection bowl. From this table, parameters A and B can be solved by establishing
the derivative equations of the deflection bowl curve, and then the deflection bowl
curve can be solved.
50 2 Structural State Surveying for Transportation Infrastructure

Table 2.1 Relevant equations for dynamic deflection measurement


Name Equation
A −Bx
Deflection bowl curve w(x) = − 2B e [cos(Bx) + sin(Bx)]
Deflection slope d ' (x) = A[sin(Bx)]e−Bx
Maximum deflection d(0) = − 2B
A

Curvature d̈(x) = AB[cos(Bx) − sin(Bx)]e−Bx


F B2
Flexibility k = 4B 2 E I = A
F2
Stiffness E= 4 A2 k I
= F
4I AB 2
−4π
Maximum deflection slope π
d 4B = e√
A
2

⎧ Vdi (x1 )

⎪ d ' (x1 ) = = A[sin(Bx1 )]e−Bx1

⎨ d ' (x ) =
Vds (x1 )
Vdi (x2 )
2 Vds (x2 )
= A[sin(Bx2 )]e−Bx2
(2.12)

⎪ · · ·

⎩ d ' (x ) = Vdi (xn )
n Vds (xn )
= A[sin(Bxn )]e−Bxn

The pavement will deform under the action of positive pressure, and the tangent
line at any point of the deflection curve is shown in Fig. 2.4.
The physical expression of the pavement deflection slope can be expressed as

∆y
k= (2.13)
∆x
where k is the slope, ∆y is the change in pavement deflection, and ∆x is the
displacement in the horizontal direction. From Eqs. (2.13) and (2.14) can be deduced.

∆y/∆t
k= (2.14)
∆x/∆t

Fig. 2.4 Definition of the


deflection slope
2.2 Road Transportation Infrastructure Surveying 51

The horizontal speed Vds and the pavement deflection velocity Vdeflection can be
expressed as Eq. (2.15).
∆y
Vdeflection = ∆t (2.15)
Vds = ∆x ∆t

Let the inclination angle be θ ; then, Eq. (2.16) can be written as follows.

∆y Vdeflection
tan θ = k = = (2.16)
∆x Vds

From Eq. (2.16), we have Eq. (2.17)

∆y Vdeflection
Slope = = (2.17)
∆x Vds

According to Eq. (2.17), the deflection slope can be calculated by measuring


the speed in the driving direction and the deflection velocity of the pavement with
pressure. In the slope equation, detection of the driving speed Vds can improve
the measurement accuracy. If the deflection velocity of pavement Vdeflection can be
measured, an equation can be established to solve the deflection slope, and then with
the deflection bowl equation, the deflection can be calculated.
2. Deflection velocity measurement

1) Surveying method
The deflection velocity of pavement under force can be measured using a Doppler
velocimeter, which emits a laser line projected onto the measured ground and
measures the combined speed along the excitation line. The pavement will be
deformed by the load wheel on the moving platform. The induced deflection velocity
can be measured by the sensor. The velocity measured by the Doppler sensor is along
the direction of the excitation line, which includes the deflection velocity of the pave-
ment, sensor motion, etc. Essentially, it is a summation of the velocity vector between
the pavement and the Doppler sensor, as shown in Fig. 2.5 [4].
The Doppler velocimeter measures the velocity of pavement deflection when
perpendicular to an ideal horizontal plane of pavement. However, the reality is that
there is always a certain angle between the velocimeter and the pavement, as shown
in Fig. 2.5b. Therefore, extra velocity components will be introduced. Suppose two
sensors are mounted on a rigid beam, and the position of the sensor along and
perpendicular to the beam directions can be precisely obtained. In that case, it can be
maintained that there is no relative displacement between the two sensors. If there is
no deflection velocity at one point and a deflection velocity at another point, we can
mitigate the induced velocity components. Therefore, it is possible to determine the
pavement deflection velocity directly using the differences in the velocity between
two velocimeters, as shown in Fig. 2.6.
52 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.5 Schematic diagram of the deflection velocity measurement

Fig. 2.6 Schematic diagram of the deflection velocity measurement with the measuring beam

The surveying model for pavement deflection velocity is sketched in Fig. 2.7.
Multiple Doppler velocimetry sensors are installed on a rigid beam. The deflection
velocity at the measuring point inside the deflection bowl can be calculated by the
difference between the deflection velocities at a point inside and a point outside the
basin.
In Fig. 2.7, M denotes the sensor. O is the center of the deflection bowl. Di is the
measuring point of the sensor on the pavement. B is the vertical projection of the
sensor’s center on the pavement. h is the height of the beam above the pavement.
The measuring points of the three sensors are located inside the deflection bowl and
outside the basin. The installation of the four sensors can be adjusted as needed.
The incident angle of the Doppler velocimeter should be within the angle between
the MO and MB; thus, the laser line can point between the MO and MB inside the
deflection bowl. A larger incident angle leads to a block of the laser line by the
vehicle’s wheel.
hm
θmax = arctan (2.18)
h
2.2 Road Transportation Infrastructure Surveying 53

Fig. 2.7 Continuous dynamic deflection measurement method

Each sensor maintains a certain angle with the beam. It measures the velocity
difference between the sensor and the pavement. The velocity is the projection of
beam vibration, beam rotation Vw , beam horizontal movement Vh , and pavement
deflection velocity Vr , in the direction of the laser line. The beam rotation velocity is
converted from the beam rotation angular velocity, which is measured by a gyroscope.
To accurately obtain the deflection velocity of the pavement and eliminate the noise
generated by the sensor movement, it is required that the installation angle αi of each
sensor with the beam be consistent. Assuming that the beam is an ideal rigid body,
there is no relative displacement between all sensors. The beam has up-and-down
vibration, pitch, roll, pitch, and heading velocity. The up-and-down vibration of the
beam can be calculated and eliminated in the case of an ideal rigid beam. Experiments
show that the influence of the beam roll and heading velocity on the deflection is
negligible. Therefore, the velocity measured by the Doppler sensor is the sum of
the horizontal velocity of the beam, the linear velocity of the beam rotation, and
the pavement deflection velocity in the direction of the laser beam.
2) Calculation method

As mentioned above, the velocity measured by the Doppler sensor is the sum of
the horizontal velocity of the beam, the linear velocity of the beam rotation, and
the pavement deflection velocity in the direction of the laser beam. It is worth noting
that the reference sensor is outside the deflection bowl, and the pavement deflection
velocity measured by it should be zero. Therefore, two feasible methods can be
used to measure and calculate the deflection velocity in the deflection bowl. The
first is the dynamic attitude calculation, which decomposes the measured velocity
of the Doppler sensor and then calculates the component related to the attitude. The
calculation accuracy depends on the measurement accuracy of the beam attitude. The
second method is through static calibration. The main factors affecting the calculation
54 2 Structural State Surveying for Transportation Infrastructure

are the horizontal speed, beam rotation velocity, and other noise. If these factors are
calibrated, deflection velocities corresponding to different sensors can be calculated.
(1) Dynamic attitude calculation mode

When modeling the motion process, the static parameters of the Doppler sensor can
be obtained by calibration experiments, including the angle between the sensor and
the beam and the angle difference between the sensors. During dynamic surveying,
the beam vibrates, forming a certain angle with the ground, and the movement of
the beam will generate an angular velocity and a linear velocity, which are used in
the following calculation. All the related parameters used in the calculation of the
pavement deflection velocity are listed in Table 2.2.
The velocity measured by any sensor is the sum of the horizontal velocity of the
beam, the linear velocity of the beam rotation, and the pavement deflection velocity
in the direction of the laser beam. The reference sensor is outside the deflection bowl,
and the pavement deflection velocity measured by it should be zero.

Vdr = Vh sin(αr + θ ) + Vv cos(αr + θ ) + Vwr cos αr (2.19)

Vdi = Vh sin(αi + θ ) + Vv cos(αi + θ ) + Vwi cos αi + Vri cos(αi + θ ) (2.20)

lr G x π
Vwr = 180
lr G x π (2.21)
Vwi = 180

Table 2.2 Parameters related to the calculation of the pavement deflection velocity
Parameters Description Unit
Gx Rotational angular velocity of sensor º/s
li The distance of sensor i to the center of spinning axis, i = 1, 2, …, n mm
lr The distance of sensor to the center of spinning axis at position 3600 mm
n Number of sensors
Vdi Velocity measured by sensor i, i = 1, 2, …, n mm/s
Vdr Velocity measured by sensor at position 3600 mm, i = 1, 2, …, n mm/s
Vh Horizontal velocity mm/s
Vri Sinking velocity of pavement at sensor, i = 1, 2, …,n mm/s
Vv Vertical velocity mm/s
Vwi Rotational linear velocity of sensor i, i = 1, 2, …, n mm/s
Vwr Rotational linear velocity of sensor at position 3600 mm mm/s
αi Installation angle of sensor i, i = 1, 2, …, n (°)
αr Installation angle of sensor at position 3600 mm (°)
θ Angle of beam and ground (°)
2.2 Road Transportation Infrastructure Surveying 55

A large number of experiments show that the rotation angle of the beam during
surveying is small. All αi , αr , αi + θ, αr + θ are less than 5°, and the corresponding
cosine value is approximately 1, so Eqs. (2.18) and (2.19) can be simplified as Eqs.
(2.22) and (2.23), respectively.

Vdr = Vh sin(αr + θ ) + Vv + Vwr (2.22)

Vdi = Vh sin(αi + θ ) + Vv + Vwi + Vri (2.23)

The real-time angle between the beam and the ground can be calculated by
substituting Eq. (2.21) into Eq. (2.22).

Vdr − Vv − Vwr
θ = arcsin − αr (2.24)
Vh

The pavement deflection velocity can be obtained by substituting Eq. (2.24) into
Eq. (2.23) to derive Eq. (2.25).
( )
Vdr − Vv − Vwr
Vri = Vdi − Vv − Vh sin αi − αr + arcsin − Vwi (2.25)
Vh

It can be seen from Eq. (2.25) that this method relies on knowing the up and
down vibration of the beam. However, the angle of the beam relative to the pavement
changes all the time, preventing a direct high-precision measurement of this vibration.
Therefore, the influence of ignoring this vibration should be considered.
One part of Eq. (2.25) can be transformed into the following equation according
to the trigonometric function.
[ ] ( )
Vdr − Vv − Vwr Vdr − Vv − Vwr
sin (αi − αr ) + arcsin = sin(αi − αr ) cos arcsin
Vh Vh
Vdr − Vv − Vwr
+ cos(αi − αr ) (2.26)
Vh

Substituting Eq. (2.26) into Eq. (2.25), we can obtain Eq. (2.27).

Vri = Vdi − Vv [1 − cos(αi − αr )] − (Vdr − Vwr )cos(αi − αr )


( )
Vdr − Vv − Vwr
− Vwi − Vh sin(αi − αr )cos arcsin (2.27)
Vh

that is,

Vri = Vdi − (Vdr − Vwr )cos(αi − αr )


56 2 Structural State Surveying for Transportation Infrastructure
( )
Vdr − Vwr
− Vwi − Vh sin(αi − αr )cos arcsin + ε1 (2.28)
Vh

Let ε1 be the calculation error.

ε1 = −Vv [1 − cos(αi − αr )] + Vh sin(αi − αr )


[ ( ) ( )]
Vdr − Vwr Vdr − Vv − Vwr
cos arcsin − cos arcsin (2.29)
Vh Vh

According to the sensor installation and calibration requirement, ai − ar ≤ 0.15◦ .


Experiments show that Vdr is approximately 700 mm/s when the horizontal speed
vh = 72 km/h. Generally, Vwr is − 100 to 100 mm/s, and Vv is − 450 to 450 mm/s. In
this case, the maximum absolute value of ε1 is 0.06 mm/s, which is negligible in the
calculation of the pavement deflection velocity. Therefore, Eq. (2.28) is simplified
as Eq. (2.30). In the same way, the maximum absolute value of ε2 is calculated to
be 0.04 mm/s, so the influence of the rotation center of the beam on the calculation
of the deflection velocity can also be ignored (where li − lr is related to the relative
position of the sensor and not related to the rotation center of the beam).

Vri = Vdi − (Vdr − Vwr ) cos(αi − αr ) − Vwi


( )
Vdr − Vwr
− Vh sin(αi − αr ) cos arcsin (2.30)
Vh

The following equations are derived by substituting Eq. (2.21) into Eq. (2.30).

Gx π
Vri = Vdi − Vdr cos(αi − αr ) − (li − lr ) − Vh sin(αi − αr ) + ε2 (2.31)
180
[ ( )]
Vdr − Vwr
ε2 = Vwr [cos(αi − αr ) − 1] + Vh sin(αi − αr ) 1 − cos arcsin
Vh
(2.32)

(2) Calculation based on parameter calibration

The pavement deflection velocities at the measuring points are Vr1 , Vr2 , and Vr3 , the
speeds of the sensors are Vd1 , Vd2 , Vd3 , and Vdr . The horizontal speed of the measured
beam is Vh , and the rotational angular speed measured by the gyroscope is G x . All
the parameters are illustrated in Fig. 2.8.
The measuring beam is moving during the surveying, and its movement can be
decomposed into rotational and translational movements. Then, the measurement
results are related as Eq. (2.33).

⎨ Vr1 = Vd1 − Vdr + k11 G x + k12 Vh + b1
V = Vd3 − Vdr + k21 G x + k22 Vh + b2 (2.33)
⎩ r2
Vr3 = Vd3 − Vdr + k31 G x + k32 Vh + b3
2.2 Road Transportation Infrastructure Surveying 57

Fig. 2.8 Deflection velocity calculation based on calibration parameters

where the parameters k11 , k21 , and k31 are obtained through gyroscope calibra-
tion using the compensation experiment. k12 , k22 , and k32 are obtained through
velocimeter calibration using the angle difference experiment. b1 , b2 , and b3 are
obtained through systematic calibration.

3. Deflection calculation

With the pavement deflection velocity Vri measured by the sensor and the vehicle
speed Vh , Eq. (2.12) can be reformed as Eq. (2.34), where x1 ,x2 and x3 are the
horizontal distances between the measuring points of the speedometer and the load
center, which are 100 mm, 300 mm, and 750 mm, respectively. Vr1 , Vr2 and Vr3 are
pavement deflection velocities at x1 , x2 , and x3 , respectively.


⎪ e−Bx1 A sin(Bx1 ) − Vr1
=0

⎨ e−Bx2 A sin(Bx ) −
Vh1
2
Vr2
Vh2
=0
(2.34)

⎪ . . .

⎩ e−Bxn A sin(Bx ) −
n
Vrn
Vhn
=0

The pavement deflection velocity Vri , in Eq. (2.34) can be calculated by using
Eq. (2.31). The speed of the beam Vh , which is also the vehicle speed, can be measured
by the encoder. The position xi is influenced by the pitching motion of the beam,
so it should be corrected using the angle θ between the beam and the pavement.
When the position xi is known, the unknown parameters in Eq. (2.34) are A and B,
which can be solved with the vehicle speed at the two measuring points. Then, the
maximum deflection d(0) = −A/2Bat the load center is obtained. The dynamic
angle θ between the beam and the road can be obtained by integrating the gyroscope
measurements. Assuming that the horizontal distance between the ith sensor and the
load center is xi0 and the installation height of the sensor is h, then the horizontal
distance between the measuring point and the load center can be calculated as xi =
xi0 − h tan(αi + θ ), where αi can be obtained by calibration.
There are only two unknowns in the equation, and two measuring points are
selected. Their deflection velocities are used to calculate the deflection. In this case,
the slope of the deflection bowl can be described by Eq. (2.35).
58 2 Structural State Surveying for Transportation Infrastructure

e−Bxi A sin(Bxi ) − VVrih = 0


, i /= j (2.35)
e−Bx j A sin(Bxi ) − Vrhj = 0
V

Assume that (A' , B ' ) is a point near the solution of Eq. (2.35), which is denoted
as (A∗, B∗). Taylor expansion can be applied to the above equation near (A' , B ' )
with regard to the parameters A and B. The expanded equations are Eq. (2.36).
⎧ ( ' ) −B ' x ( ) ' '
⎪ ' V
i − ri + ∆A sin B ' x e−B xi + ∆B A' x e−B xi [cos(B ' x ) − sin(B ' x )] = 0

⎨ A sin B xi e i i i i
Vh

⎪ ( ) ' V ( ) ' '
⎩ A' sin B ' x j e−B x j − r j + ∆A sin B ' x j e−B x j + ∆B A' x j e−B x j [cos(B ' x j ) − sin(B ' x j )] = 0
Vh
(2.36)

where ∆A = A − A' and ∆B = B − B ' . Given ( A' , B ' ), Eq. (2.36) is a system
of linear equations of two unknowns, ∆A and ∆B, which can be directly calcu-
lated. Then, parameters (A, B) can be obtained. If the number of iterations exceeds a
threshold or the values of ∆A and ∆B are both less than a threshold, the iteration is
terminated, and the A and B of the last iteration are used as the approximate solution
of Eq. (2.35). Otherwise, (A' , B ' ) equals (A, B), and iteration continues until the
termination condition is satisfied. With the parameters A and B finally output from
the iteration, the maximum deflection and the deflection bowl curve can be directly
derived.
Deflection measurement is affected by many factors, such as road material, struc-
ture, age, and temperature. The measured deflection needs to be corrected with regard
to these factors.
Dynamic deflection measures the vertical deformation of the pavement under the
load force. The magnitude of the dynamic load directly affects the results of the
dynamic deflection measurement. As shown in Fig. 2.9a, the greater the dynamic
load is, the greater the deflection velocity obtained by the measurement system and
the greater the calculated deflection. In practice, pavement is not a smooth surface.
Pavement texture and elevation fluctuation prevent the surveying equipment from
moving at a constant speed, especially for road segments with poor flatness. The
roughness affects the force transmission from the dynamic load to the pavement. As
shown in Fig. 2.9b, the dynamic surveying equipment suffers from low-frequency
vibration and high-frequency vibration. The dynamic load varies all the time, and
the fluctuation scope can reach ± 20% of the static weight, which is 100 kN.
The pavement deflection velocity accounts for a small amount of the total velocity
measured by the Doppler velocimeter. Experiments show that the velocity measured
by the Doppler velocimeter is approximately 300–1300 mm/s, and the vehicle speed
is 200–1100 mm/s. The pavement deflection velocity is less than 40 mm/s. Therefore,
the influence of the vehicle speed on the deflection speed cannot be ignored.
The pavement temperature is the main factor affecting its bearing capacity, espe-
cially for the asphalt surface. The temperature of pavement may vary significantly
during the day and may approach 70 °C in summer. Pavement humidity will affect
the cohesion of the roadbed. Humidity measured at high temperatures is usually
2.2 Road Transportation Infrastructure Surveying 59

Fig. 2.9 Dynamic wheel loading of a continuous dynamic deflectometer

greater than at low temperatures [5]. Temperature correction for pavement deflection
is regarded as a challenging task. Usually, the influence of pavement temperature is
corrected by a temperature correction coefficient, as shown in Eq. (2.37).

DTref = DT K T (2.37)

Subgrade humidity, pavement thickness, and road structure type also affect the
propagation of deflection waves. Limited by current measurement technology, these
factors are temporarily not considered in the correction model. In the following
context, three factors, dynamic load, vehicle speed, and pavement temperature, are
considered in the deflection correction model.
A straightforward model for deflection correction is established as Eq. (2.38).

Yd = Ym F (2.38)

where Ym is the deflection calculated by the pavement deflection velocity, and Yd is the
corrected deflection. F is the integrated correction coefficient considering multiple
influencing factors.
Regression analysis is used to establish the relationship between the correction
coefficient F, and the influencing factors, including pavement temperature, vehicle
speed, and dynamic load. Then, the ternary quadratic regression model is established
as Eqs. (2.39) and (2.40).


3 ∑
3 ∑
3
F = b0 + bk X k + bkp X k X p + ε (2.39)
k=1 k=1 p=k
⎡ ⎛ ⎞⎤2

3 ∑
3 ∑
3 ∑
3
min ε2 = min ⎣ F − ⎝b0 + bk xki + bkp xki x pi ⎠⎦ (2.40)
i=1 k=1 k=1 p=k

where bk and bkp are regression coefficients, X 1 is the vehicle speed influence factor,
X 2 is the dynamic load influence factor, X 3 is the pavement temperature influence
60 2 Structural State Surveying for Transportation Infrastructure

factor, ε is the error, i is the number of samples, n is the total number of samples, x1i
is the ith sample value of the pavement temperature influence factor, x2i is the ith
sample value of the measured speed influence factor, and x3i is the ith sample value
of the dynamic load influence factor.
Deflection measured by using the Benkelman beam or FWD, Yr , can be regarded
as the reference data. In other words, these measurements can be regarded as the
deflection after correction, Yd . Then, regression analysis on F = Yr /Ym can be used
to derive the correct coefficient F, which reflects the influencing factors. With three
samples, there are nine coefficient parameters for three influencing factors, as listed
in Table 2.3.
The best regression model (with the largest correlation coefficient) and its param-
eters under the condition of using different numbers of influencing factors were
counted. The statistical results are shown in Table 2.4.
When three influencing factors are used in the model, the regression results are
closest to the optimal regression, indicated by the maximum correlation coefficient,
the minimum root mean square error (RMSE), and the mean absolute error. The
corresponding optimal F value can be calculated using Eq. (2.41).

F = 2.1813 + 0.00597X 1 − 0.02431X 2 − 0.0032X 3 (2.41)

It can be seen from the model that the deflection velocity is positively correlated
with the correction coefficient. The larger the deflection velocity is, the larger the
correction coefficient. In contrast, the dynamic load has a negative relationship with

Table 2.3 Influencing factors of the regression model


No 1 2 3 4 5 6 7 8 9
Influencing factors X1 X2 X3 X 1X 2 X 1X 3 X2X3 X 1X 1 X2X2 X 3X 3

Table 2.4 Statistical parameters of the best regression model


Number of Correlation Statistics Variance of Average RMSE
factors coefficient coefficient, F estimated error absolute error
mean
1 0.62285 568.1 0.0026494 0.04112 0.051323
2 0.91416 1826.4 0.00060476 0.018735 0.024485
3 0.94541 1974.1 0.00038575 0.015015 0.019527
4 0.94678 1516.7 0.00037712 0.014672 0.019279
5 0.94744 1225.7 0.00037357 0.01455 0.01916
6 0.94781 1026.1 0.00037203 0.014471 0.019092
7 0.9482 883.8 0.00037037 0.014392 0.019021
8 0.94824 771.69 0.00037117 0.014358 0.019014
9 0.94835 685.54 0.00037144 0.014369 0.018992
2.2 Road Transportation Infrastructure Surveying 61

Fig. 2.10 Correction


function of the deflection
value

the correction coefficient. The larger the dynamic load is, the smaller the correction
coefficient. Pavement temperature is also negatively correlated with the correction
coefficient.
Figure 2.10 demonstrates the performance of the correction. The X-axis is the
regression result, Z, and the Y-axis is the expected value, Y, of the regression analysis.
It can be seen from the figure that the difference between the measured deflection and
the reference deflection is smaller than that before the correction. The average differ-
ence is 0.52, and the maximum difference is 2.89, while the average and maximum
are 2.87 and 20.20, respectively.
There are many factors that affect pavement deflection measurement, such as
temperature, paving material, and service life. The correction method is only given
based on the measured data. The adaptability of the method needs to be verified and
improved by users. With the development of deep learning and increasing surveying
data, generalized dynamic deflection correction methods are needed [6].

2.2.2 Pavement Distress Detection

Distortions in asphalt pavement are caused by instability of an asphalt mix or weak-


ness of the base or subgrade layers. These may include rutting, shoving, depres-
sions, swelling, and patch failures. Causes of pavement distress can be single item
or composite items of poor material characteristics, poor mix design characteristics,
poor construction design, traffic loading, and the environment of heat and moisture.

1. Damage detection

From the analysis of data features, pavement distress can result in visual changes
and deflection, which can both be detected. At present, visual technology is widely
used to detect visual changes, such as pavement cracks and patches. Traditionally,
62 2 Structural State Surveying for Transportation Infrastructure

an imaging system is used to obtain pavement images, and then 2D grayscale infor-
mation processing is used to analyze pavement cracks and quantitatively evaluate
pavement distress.
Visual technology has been used in detecting pavement distress in equipment,
such as the Australian ARRB, Canadian Roadware, Waylink, and Wuhan Optics
Valley ZOYON RTM series. They all use the passenger car or truck as the surveying
platform, and the detection sensor is installed in the rear of the vehicle. The image
acquisition unit consists of a CCD line scanning camera or array industrial camera
with auxiliary lighting. During data acquisition, the camera activates when it receives
an external trigger signal from the DMI and the acquired images are stored in the
on-board server. These surveying cars are shown in Fig. 2.11.
1) 2D imaging
The acquisition of pavement images mainly depends on industrial cameras with
auxiliary light illumination. The auxiliary light source needs to be selected according
to the camera, whose working mode, frequency, resolution, focal length, and field
angle need to be carefully chosen according to the application. The camera is usually
triggered in either internal or external modes. In the internal trigger mode, the camera
collects data at a given frequency, while in the external trigger mode, the camera
works only when it receives an external trigger signal. In the practical application
scene, it is difficult to maintain a uniform speed along the surveying route. Repeated
collection or missing collection may occur in the internal trigger mode. Usually, a

Fig. 2.11 Typical rapid surveying equipment used for transportation infrastructure from different
countries in the world
2.2 Road Transportation Infrastructure Surveying 63

third mode called the equidistant trigger mode is adopted to ensure data integrity and
consistency. According to the specification, the accuracy of pavement image data is
1 mm, the measurement width is 3750 mm of lane width, and the maximum speed
is above 100 km/m.
The pavement image is formed in the camera according to the reflected light
intensity it receives from the ground. The more light intensity the ground reflected
to the camera, the larger the image gray value is, and vice versa. Contrasting the
crack and the background in the image is necessary regardless of whether the image
is processed manually or automatically. A pavement crack is intrinsically the surface
change with varying crack depth caused by rupture on the continuous surface texture.
The laser line is illuminated right above the crack, which is approximately irradiated
with parallel light, as shown in the left panel of Fig. 2.12. Orthotopic illumination
leads to the same light reflection to the camera from crack and non-crack surfaces;
therefore, it is challenging to enhance the contrast. Illuminating the laser line to the
ground with an incident angle can solve this problem. In theory, a shadow area will
be formed at the crack, resulting in a more pronounced contrast between the crack
and background, as shown on the right panel of Fig. 2.12, which is conducive to data
processing.
Figure 2.13 shows a line scan camera whose measuring unit is composed of two
cameras that face vertically downward and two lasers that are light sources for the
camera on the other side. The left laser illuminates the focal field of the right camera,
and the right laser illuminates the left camera, as shown in Fig. 2.13.
Pavement images are acquired by using the above-mentioned device. These data
can be processed by digital image processing or human-computer interaction to
recognize pavement distress. Two examples of the acquired pavement images are
shown in Fig. 2.14.
Alternatively, the array scan camera can also be used to acquire pavement images.
The array scan camera is more efficient than the line scan camera because it is
equivalent to the collection of multiple acquisitions by the line scan camera. Another
difference is that the laser lighting cannot assist the array scan camera. Instead, LEDs,
halogen lamps, or natural light are generally used. If auxiliary lighting such as LED
is used, the stroboscopic flash would affect the normal driving of the following car,
which is a potential hazard. The array scan camera cannot be used at night without
auxiliary lighting, so it is rarely used in real applications.

Fig. 2.12 Comparison between orthotopic and incident illumination


64 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.13 Imaging geometry of the line scan camera

Fig. 2.14 Typical 2D pavement images

(1) Description of the image characteristics


Asphalt concrete pavement is made of aggregates, mineral powder, and asphalt
at appropriate proportions, which are heated to a certain temperature, mixed, and
compacted by paving. The aggregates are of various sizes and particles. Aggregate
is a general term for mineral aggregates in asphalt pavement materials, which play
the role of skeleton and filling in pavement materials. Aggregates are classified into
coarse aggregates (19–26.5 mm), medium aggregates (13.2–19 mm), and fine aggre-
gates (4.75–13.2 mm) according to particle size. Those with 4.75 mm and below
are called sand aggregates. The aggregate forms the grain texture of the pavement,
which introduces considerable spot noise in pavement crack detection, as shown in
Fig. 2.15.
Due to the long-term rolling of the wheels, the cracks in the pavement are prone
to degradation, as they are exposed to the erosion of wind, frost, rain, snow, and
weathering. Cracks filled with sand and dust [7]. The level of degradation of cracks
varies with the location of cracks in the pavement. For example, cracks in the position
where the wheels are often rolled will degrade rapidly, and the degradation will be
more serious, while the cracks in the middle of the pavement will degrade relatively
slowly. Crack degradation reduces the contrast between the crack and the pavement
2.2 Road Transportation Infrastructure Surveying 65

Fig. 2.15 Spot noise is produced by the granular properties of the pavement material

background, and causes the cracks to have poor brightness continuity. For example,
the pavement image in Fig. 2.16a, b contains degraded cracks. At the position indi-
cated by the arrow, the degradation is severe, and the continuity of the cracks is
damaged.
Cracks are irregular line targets consisting of shorter fracture segments, as shown
in Fig. 2.17a.

Fig. 2.16 Examples of pavement crack degradation


66 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.17 Schematic


diagram of the crack image

For a crack segment with a certain depth, shadows will be formed when shooting
from certain directions, leading to an obviously exposed segment that is darker than
the background. A hidden segment is caused when light is incident along the crack
orientation. Due to the irregularity of the distribution of fracture segments (the cracks
are usually tortuous), the resulting exposed and hidden fracture segments are usually
alternated when shooting from a single angle. As shown in Fig. 2.17b, the exposed
fracture segments generally form a linear target with a darker background.
Currently, there are many methods for crack enhancement and detection based on
threshold segmentation, edge detection, and machine learning. However, these algo-
rithms assume a balanced image brightness, high contrast, and continuity of cracks,
and they are not applicable when these assumptions are not applicable, which is
true in many real application scenarios. When the crack signal is of low intensity,
resulting in a very low signal-to-noise ratio relative to the pavement background,
automatic identification of pavement cracks is difficult to be achieved with tradi-
tional edge detection-based methods, morphological methods, threshold segmenta-
tion methods, etc. When it comes to complex backgrounds, only part of the cracks
can be identified, and the result contains many noisy surface elements, which cannot
meet the requirements of high-precision pavement crack detection [8]. Therefore, it
is urgent to conduct research on pavement cracks with a low signal-to-noise ratio and
propose reliable crack enhancement and extraction methods to improve the automatic
identification rate of pavement cracks.
(2) Pavement crack enhancement

Pavement cracks are often degraded due to vehicle load rolling and natural weath-
ering, resulting in extremely low contrast between the cracks and the pavement back-
ground and even causing discontinuous cracks. These degrades bring great difficulties
to the automatic identification of cracks based on visual detection. Enhancing the
signal strength of pavement cracks, such as contrast and continuity, can increase the
identifiability of cracks.
In the past two decades, researchers in the field of computer vision have tried to
introduce the principles of biological visual perception to image processing. Tensor
voting is one of the representative results. The tensor voting algorithm uses second-
order tensors to represent data primitives, embeds perceptual organization rules into
2.2 Road Transportation Infrastructure Surveying 67

the voting process, realizes information transfer between data primitives through the
voting field, and finally combines second-order tensors and matrices isomorphism,
which uses matrix feature analysis technology to extract spatial geometric structure
features. This method has been widely studied and applied in the Computer Vision
field.
In binary crack images, the pixels corresponding to cracks often have linear aggre-
gation characteristics, and human vision can identify cracks by the proximity and
continuity of pixels. With the help of tensor voting, perceptually oriented prox-
imity rules and continuous rules can be used to enhance cracks in binary images
to improve signal strength and prepare for subsequent crack extraction. To achieve
signal enhancement for points on cracks and signal suppression for noncrack points,
two voting steps, ball voting and stick voting, are adopted [9].
First, the target pixels in the thresholded binary image are used as input data.
However, these pixels have no direction information. Therefore, this step uses ball
voting to estimate the orientation of each crack pixel (including noise pixels). More
specifically, each target pixel (token) is represented by a spherical tensor, so that it has
an equal saliency in all directions. Then, the information transfer between tensors
is realized through the ball voting field. In the specific operation, the ball voting
field is obtained by rotating the stick voting field around the center point at a small
interval at a certain angle and superimposing these stick voting fields. In the process
of accumulating the ball voting field strength of each token in its neighborhood,
information such as the proximity and continuity between it and the neighborhood
token is also added to its corresponding tensor. The purpose of ball voting is to add
directional information to the token, so only the target pixel needs to be voted, that
is, sparse voting. Simulated crack pixels are shown in Fig. 2.18a. After ball voting,
each crack pixel has a direction, as shown in Fig. 2.18b, and the direction of the
target pixel is consistent with the extension direction of the simulated crack.
In the sense of isomorphism, the second-order tensor corresponds to a matrix,
and the linear curvature is expressed in the tangential direction instead of the normal
direction. Therefore, after the ball voting is completed, the smaller eigenvalue of
the matrix can be set to mold, transforming the spherical tensor into a rod-shaped
tensor. After obtaining the stick tensor, stick voting is used to cover the voting field

Fig. 2.18 Crack point simulation and crack enhancement test based on tensor voting
68 2 Structural State Surveying for Transportation Infrastructure

of the stick tensor to all neighborhood areas, which is tensor voting. Tensor voting
can softly fill the interval between adjacent crack points, and the filling strength is
determined according to the superposition of the field strength of the voting field.
For the tensor after stick voting, the following equation can be obtained through
eigen-decomposition.
T T T
(λ1 − λ2 )e1 e1 + λ2 (e1 e1 + e2 e2 ) (2.42)

According to eigen-decomposition, λ1 −λ2 derived using two- dimensional tensor


voting can better reflect the saliency of the linear structure, the token with the linear
structure feature has a larger value of λ1 − λ2 , and the noise point has a smaller value
of λ1 − λ2 . Therefore, λ1 − λ2 is used as the confidence for judging that the target
point is a crack point, and a crack probability map can be generated. Enhancement
of the crack target and the suppression of noise are also realized in the generation
of the crack probability map. Figure 2.18c is the crack probability map obtained by
stick voting and line feature extraction, as shown in Fig. 2.18b.
The crack probability map, which is based on the λ1 − λ2 values, describes the
linear significance of each target point. In the process of stick voting, the voting
field strength (the number of votes) is delivered to the adjacent tokens in the stick
direction. However, when voting, the linear significance of all tokens is the same.
Considering that the crack probability map can better reflect the linear saliency of
each point in the binary image, this section proposes a tensor voting embedded with
linear saliency for crack enhancement of binary pavement images. The algorithm
consists of the following steps.
➀ Perform ball voting and stick voting on the binary pavement image B0 , decom-
pose the voting results according to Eq. (5.11), and use the value of λ1 − λ2 to
construct the crack probability map Bp .
➁ Perform ball voting on image B0 , estimate the direction of each token, remove
the tensor with λ1 − λ2 < 0.3, and set λ2 to 0 to change all other tensors into stick
tensors. The result is recorded as Bs .
➂ Perform a stick vote on Bs . When voting, assign each Bp value to the voting field
of each stick tensor in Bs with a multiplicative parameter, and then obtain the
stick voting result Bp,s .
The above algorithm embeds the crack probability of each point during voting,
that is, the linear saliency of each point, which is called tensor voting with embedded
linear saliency.
Figure 2.19a is an image of a highway surface in which the crack has very low
contrast and poor continuity, and there is a large area of black dirt on the right side
of the image. After pre-processing, a binarized pavement image can be obtained,
as shown in Fig. 2.19b. The binary image contains a large number of noisy areas,
and a larger circular noisy area is formed in the dirty area. Figure 2.19c is the result
of tensor voting on Fig. 2.19b, namely, the crack probability map. Comparing (a)
and (c) in Fig. 2.19, it can be found that the pixels at the crack line have a large
2.2 Road Transportation Infrastructure Surveying 69

Fig. 2.19 Experiment of crack enhancement in binary image based on tensor voting

crack probability and are continuously distributed in space. The crack probability
of the pixel at the blocky dirt surface is not much different from the pavement
background, which means the signal of the crack pixel is enhanced, and the signal of
the background noise is suppressed. This shows that the crack enhancement based on
tensor voting designed in the previous section is effective. Experiments are conducted
on the proposed tensor voting algorithm for embedding linear saliency. Take the
crack probability map shown in Fig. 2.19c as linear saliency and embed it as the
multiplicative parameter into the field strength generating function for stick voting,
which results in Fig. 2.19d. A comparison between Fig. 2.19c, d shows that the
embedded linear saliency tensor voting derives a further enhanced crack, especially
when the crack line is relatively flat. The weakness of the proposed algorithm is
that it also enhances the linearly distributed noise area. For example, there is a strip-
shaped pollutant in the lower right corner of Fig. 2.19a, whose intensity is increased
70 2 Structural State Surveying for Transportation Infrastructure

after the crack enhancement algorithm based on tensor voting. Generally, pollutants
appear to be areas with wider widths or shorter lengths than those of cracks. These
characteristics of pollutants can be used to differentiate them from pavement cracks.

(3) Pavement crack analysis

Cracks are the most common pavement distress and are typical line structures. There-
fore, crack detection can be regarded as line detection in computer vision. Deep-
Crack is an end-to-end trainable deep convolutional neural network for automatic
crack detection that learns high-level features for crack representation. This network
combines depthwise convolutional features learned in the convolution layers, where
the detailed features are captured in shallow layers while the comprehensive repre-
sentation is learned in the deep layers. DeepCrack uses an encoder-decoder structure
to pairwisely fuse the convolutional features in the encoder and the decoder networks.
➀ Multi-scale convolutional feature fusion network DeepCrack

DeepCrack is a fully convolutional network, and its encoder-decoder structure refers


to SegNet, which consists of an encoder network and a corresponding decoder
network. The encoder uses the convolutional layers in VGG16, which consists of 13
convolutional layers and 5 downsampling pooling layers [10]. The decoder contains
13 convolutional layers, each with a corresponding layer in the encoder network.
Therefore, the encoder network is almost symmetrical to the decoder network. The
only difference is that the first encoder layer (the first convolution operation) produces
multichannel feature maps. In contrast, the corresponding last decoder layer (the last
convolution operation) generates c-channel feature maps, where c is the number of
classes in the image segmentation task.
After each convolution layer, batch normalization is applied to the feature maps.
A max-pooling operation with a stride greater than one can reduce the scale of feature
maps without causing translation differences in small spatial shifts, but downsam-
pling results in a loss of spatial resolution, which may lead to boundary bias. To
avoid loss of details, max-pooling indices are used to capture boundary information
and record it in the encoder feature map during downsampling. Then, in the decoder
network, the corresponding decoder layer performs nonlinear upsampling by using
the max-pooling index. This upsampling will generate sparse feature maps. However,
compared with continuous and dense feature maps, sparse feature maps record more
precise boundaries.
The receptive fields increase with the increasing depth of the convolution layers,
resulting in multi-scale convolutional features. DeepCrack is constructed on the
encoder-decoder architecture of SegNet, in which there are five different scales
corresponding to the five downsampling pooling layers. To use both sparse and
continuous feature maps at each scale, DeepCrack performs skip-layer fusion to
connect the encoder and decoder networks. As shown in Fig. 2.20, the convolutional
layers before the pooling layer at each scale in the encoder are connected to the
last convolutional layer at the corresponding scale in the decoder. Skip layer fusion
concatenates convolutional features through a series of operations.
2.2 Road Transportation Infrastructure Surveying 71

Fig. 2.20 Network structure of deepcrack

Figure 2.21 illustrates the skip layer fusion in detail. First, the feature maps from
the encoder and decoder are concatenated, followed by a 1 × 1 convolutional layer to
reduce the multichannel to one channel. Then, to compute the pixel-wise prediction
loss at each scale, a deconv layer is added to upsample the feature maps, and a cropped
layer is used to crop the upsampled feature map to the size of the input image. After
these operations, we can obtain prediction maps at all scales of the same size as the
input image. The five-scale prediction maps are further concatenated, and a 1 × 1
convolution operation is then used to fuse the outputs of all scales. Finally, we can
obtain prediction maps at each skip layer fusion and at the final fusion.
An effective loss function is then used to train the deep network for crack segmen-
tation. Let us say that the training dataset, S, contains N images, as expressed in
Eq. (2.43).
72 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.21 The structure of the skip connection layer

{ }
S = (X n , Y n ), n = 1, 2, . . . , N (2.43)
{ } {
where X n = xi(n) , i = 1, 2, · · ·, l is the original image and Y n = y i(n) , i = 1,
}
2, · · ·, I, y i(n) ∈ 0, 1 is the ground truth.
In the encoder-decoder architecture, let K be the number of convolutional stages;
then, at stage k, the feature map generated by skip layer fusion can be expressed as
Eq. (2.44).
{ }
F (k) = f i(k) , i = 1, . . . , I (2.44)

where k = 1, 2, … , K. Furthermore, the multi-scale fused feature map can be defined


as Eq. (2.45).
{ }
F fuse = f i fuse , i = 1, 2, . . . , I (2.45)

Different from the semantic segmentation of Pascal VOC, crack detection is a


binary classification problem with only two classes. A cross-entropy loss is used
to measure the prediction error. Usually, the ground-truth crack pixels belong to
the minority class in the crack image, which makes it an imbalanced classification
or segmentation. Some works solve this problem by adding larger weights to the
minority class. However, in crack detection, we found that increasing the weight
of cracks leads to more false positives. Therefore, the pixel-wise prediction loss is
defined as Eq. (2.46).

log[1 − P(Fi , W )], yi = 0


l(Fi , W ) = (2.46)
log[P(Fi , W )], yi = 1
2.2 Road Transportation Infrastructure Surveying 73

where Fi is the pixel value of the feature map from the network, W is the weight
of the parameters of the deep network, and P(F) is the standard sigmoid function,
which converts the feature map into a crack probability map. Then, the total loss can
be expressed as Eq. (2.47).


I ∑
K
L(W ) = [l(Fi(k) , W ) + l(Fifuse , W )] (2.47)
i=1 k=1

The CrackTree dataset (260 images) is used to train the DeepCrack model. It is
expanded to 35,100 images of 512 × 512 size after operations such as cropping and
rotating. To test the crack detection, two test sets are constructed: CRKWH100 and
CrackLS315. The former contains 100 pavement images of 512 × 512 captured by
line scan cameras under visible light illumination, and the ground sampling distance
(GSD) is 1 mm. The latter includes 315 pavement images of 512 × 512 acquired
with laser illumination.
Figure 2.22 compares pavement crack detection using DeepCrack and other
methods. Figures 2.23 and 2.24 demonstrate the performance of DeepCrack on
CRKWH100 and CrackLS315, respectively. DeepCrack can completely detect crack
lines even with the appearance of pavement shadows, oil pollution, and road
markings.

➁ LinkCrack, neighborhood connection constrained deep convolutional network

DeepCrack demonstrates the effectiveness of the encoder-decoder structure in


extracting linear features, enabling pixel-wise crack recognition. However, the

Fig. 2.22 Comparison of crack extraction results using DeepCrack and other methods
74 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.23 Results trained on CRKWH100

Fig. 2.24 Results trained on CrackLS315

existing encoder-decoder network structure still has two problems. First, the complex
deep network results in a low computational efficiency. The encoder is realized in
the VGG network, which contains many deep convolutional layers. Although this
network has a good representation ability, there are a large number of weight param-
eters to be trained, which requires a large volume of training data and a great amount
of calculation. Second, the detected crack lines are of poor continuity when the
crack is very thin, and the background is noisy or with dirty areas. Therefore, some
constraints need to be introduced to guide crack detection and extraction in a more
high-level context to form continuous linear cracks.
In DeepCrack, semantic segmentation is carried out pixel-wise, and connectivity
is not taken into consideration. Considering that pavement cracks are linear targets
2.2 Road Transportation Infrastructure Surveying 75

in images, their spatial continuity can be used as a constraint in network training


[11] to improve the performance of crack detection. To this end, LinkCrack was
proposed; its structure is shown in Fig. 2.25. It is a deep convolutional network and a
typical encoder-decoder architecture. LinkCrack consists of an encoder network and
a decoder network, both of which contain four stages, and it fuses the convolutional
features at the corresponding stages through skip-layer connections.
The input image size of LinkCrack is 512 × 512. The encoder network uses a
34-layer residual network, ResNet-34 [12], as the backbone network. Each scale is
connected to several residual blocks. The stride of the first residual block of each layer
is s = 2. Through downsampling, feature maps at four different scales can be obtained,
and the number of feature channels is doubled in each downsampling step. At each

Fig. 2.25 Architecture of the proposed LinkCrack network


76 2 Structural State Surveying for Transportation Infrastructure

scale before downsampling, the feature maps are concatenated with the convolutional
features at the corresponding scale of the decoder through a skip layer connection.
In this way, the feature information can be propagated to higher-resolution layers.
The decoder network is similar to the U-Net [13] network. It starts with a decon-
volution upsampling operation, which doubles the dimension of the feature map.
Then, it is fused with the feature map of the encoder from the corresponding scale,
connecting two convolutional layers. This operation is repeated at each scale to learn
multi-scale features and to improve the expression accuracy. To reduce the number
of parameters, the number of channels in the upper convolutional layers is reduced.
In the decoder network, an upsampling method with nearest neighbor interpolation
is used to increase the size of the feature maps, and the corresponding encoder layer
features are merged using point multiplication to reduce the number of parameters
in the decoder part. In the final output layer, a 1 × 1 convolution is employed to map
the feature vector to the crack mask and eight neighborhood connection maps.
The more parameters there are in the deep convolutional network, the more
complex the network structure is, and the generated high-resolution feature map can
help detect crack pixels. However, it also increases computational cost and slows
down the detection speed. On the other hand, convolution kernels with larger recep-
tive fields are more advantageous in detecting cracks with poor continuity. Therefore,
to increase the receptive field of the encoder convolutional layer, LinkCrack makes
the following changes to the encoder backbone network ResNet-34.
The kernel size of the first layer convolution is converted from 7 × 7 to 3 × 3, and
the max pooling layer is removed. ResNet was originally designed for classification
tasks, and the model does not need to focus on subtle texture features, which is crucial
for crack recognition.
Atrous convolution [14] is used in layers 4 and 5 to expand the receptive field
of the network to encode multi-scale objects. Atrous convolution is widely used in
image segmentation tasks and can perform convolution with fewer parameters while
maintaining a large receptive field.
The number of parameters in LinkCrack is approximately one-fifth of that of U-
Net, and the model parameters of each layer are shown in Table 2.5. It is expected
that this simplified network structure can be used to train on relatively small training
datasets and avoid overfitting.
A spatial continuity constraint is introduced to model the crack continuity during
training. LinkCrack builds an 8-neighbor connection graph to represent image conti-
nuity, as shown in Fig. 2.26. The ground truth is a binary image in which white pixels
are cracks, and black pixels are the background. Given a crack pixel, for each of its
eight neighbors, the connection can be positive or negative depending on whether the
neighboring pixel is a crack pixel or a background pixel. Therefore, eight positive/
negative values are determined according to its eight neighboring pixels, and the
continuity of cracks in all directions can be quantitatively predicted.
The performance of the network is evaluated by the similarity between the
predicted label and the true label. Generally, the closer the prediction is to the label,
2.2 Road Transportation Infrastructure Surveying 77

Table 2.5 Detailed information on the LinkCrack network


Layers Name Kernel Stride s Padding p Dilation Activation Output
size/ rate r function
channel
Level 1 Conv1_1 3 × 3/32 1 1 1 ReLU 512 × 512
Conv1_2 3 × 3/64 1 1 1 ReLU 512 × 512
Level 2 ResBlk2_1 3 × 3/64 2 1 1 ReLU 256 × 256
ResBlk2_2 3 × 3/64 1 1 1 ReLU 256 × 256
ResBlk2_3 3 × 3/64 1 1 1 ReLU 256 × 256
Level 3 ResBlk3_1 3 × 3/64 2 1 1 ReLU 128 × 128
ResBlk3_2 3 × 3/64 1 1 1 ReLU 128 × 128
ResBlk3_3 3 × 3/64 1 1 1 ReLU 128 × 128
ResBlk3_4 3 × 3/64 1 1 1 ReLU 128 × 128
Level 4 ResBlk4_1 3 × 3/ 2 1 1 ReLU 64 × 64
128
ResBlk4_2 3 × 3/ 1 1 1 ReLU 64 × 64
128
ResBlk4_3 3 × 3/ 1 2 2 ReLU 64 × 64
128
ResBlk4_4 3 × 3/ 1 1 1 ReLU 64 × 64
128
ResBlk4_5 3 × 3/ 1 2 2 ReLU 64 × 64
128
ResBlk4_6 3 × 3/ 1 1 1 ReLU 64 × 64
128
Level 5 ResBlk5_1 3 × 3/ 2 1 1 ReLU 32 × 32
256
ResBlk5_2 3 × 3/ 1 2 2 ReLU 32 × 32
256
ResBlk5_3 3 × 3/ 1 1 1 ReLU 32 × 32
256
Level 6 Deconv6_1 3 × 3/ 2 1 1 ReLU 64 × 64
128
Conv6_2 3 × 3/ 1 1 1 ReLU 64 × 64
128
Conv6_3 3 × 3/ 1 1 1 ReLU 64 × 64
128
Level 7 Deconv7_1 3 × 3/64 2 1 1 ReLU 128 × 128
Conv7_2 3 × 3/64 1 1 1 ReLU 128 × 128
Conv7_3 3 × 3/64 1 1 1 ReLU 128 × 128
Level 8 Deconv8_1 3 × 3/64 2 1 1 ReLU 256 × 256
Conv8_2 3 × 3/64 1 1 1 ReLU 256 × 256
(continued)
78 2 Structural State Surveying for Transportation Infrastructure

Table 2.5 (continued)


Layers Name Kernel Stride s Padding p Dilation Activation Output
size/ rate r function
channel
Conv8_3 3 × 3/64 1 1 1 ReLU 256 × 256
Level 9 Deconv9_1 3 × 3/64 2 1 1 ReLU 512 × 512
Conv9_2 3 × 3/32 1 1 1 ReLU 512 × 512
(mask)
Conv9_3 1 × 1/1 1 1 1 None 512 × 512
(mask)
Conv9_2 3 × 3/32 1 1 1 ReLU 512 × 512
(link)
Conv9_3 1 × 1/8 1 1 1 None 512 × 512
(link)

Fig. 2.26 Illustration of crack pixel’s 8-neighbor domains

the smaller the numerical result calculated according to the loss function. For regres-
sion problems, the commonly used loss function is the mean square error. For clas-
sification tasks, such as semantic segmentation, the commonly used loss function is
cross-entropy loss. The cross-entropy loss describes the difference between the true
distribution and the predicted probability distribution of the labels. The smaller the
loss value is, the more accurately the model can predict the labels.
With the connectivity constraint and the cross-entropy, the network can be trained
to generate a prediction close to the true value. The loss function in LinkCrack is
defined as the weighted sum of the pixel loss and the neighboring connection loss,
which can be calculated using Eq. (2.48).

L = L pixel + λL c (2.48)

where L pixel is the pixel loss, L c is the neighboring pixel loss, and λ is the
corresponding weight parameter.
2.2 Road Transportation Infrastructure Surveying 79

Generally, the cracked pixels occupy only a small part of the whole image, and
most remaining pixels are the background area. The ratio of crack pixels to back-
ground pixels is much less than one. If the same weight is set for all pixels, the loss
function will be overwhelmed by the background loss, which results in unexpected
results.
To solve this problem, different weights are added to the loss of crack pixels and
background pixels. With the calculated proportion of crack and background pixels
in the label, the pixel loss is defined as Eq. (2.49).

a log(1 − P(Fi , θ )), yi = 0


L pixel = (2.49)
log(P(Fi , θ )), yi = 1

where Fi is the output feature of pixel i in the network, θ is the LinkCrack model
parameter, P(·) is the standard sigmoid function, and a is the class balance weight
coefficient, whose value is calculated using Eq. (2.50).

pixel_ numcrack
a= (2.50)
pixel_ numimage

where pixel_ numcrack is the number of crack pixels and pixel_ numimage is the number
of background pixels.
In terms of spatial constraints, the neighboring connection loss L c is defined as


8
Lc = Lk (2.51)
k=1

where L k is the kth neighboring connection graph loss.


The cross-entropy loss is used to compute the eight neighboring connection losses
of crack pixels. Figure 2.27 shows the crack detection effect of LinkCrack under
different λ values. The larger the value is, the better the connectivity of the extracted
cracks, showing the effectiveness of the linear space constraint in the training.
However, a larger λ (>10) results in false positive detection. The loss of neigh-
boring connections needs to be balanced with the cross-entropy loss of foreground
pixels.

2) 3D point cloud processing

The image-based pavement distress detection has difficulty detecting subsidence,


lumps, and other diseases without typical visual characteristics. At the same time,
the water, oil, and shadow of the pavement are difficult to distinguish from cracks after
imaging, which affects the automatic processing of disease data. In fact, compared
with normal pavement, pavement diseases have a certain degree of deflection. If a
3D model of the pavement is established, all diseases will be represented in the 3D
model. Based on the 3D model and the characteristics of deflection, it is theoretically
possible to achieve all types of disease detection, especially pavement water stains, oil
80 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.27 Crack detection results of LinkCrack using different neighboring connection loss weights

stains, and shadows, with no change in elevation and no impact on the establishment
of a 3D model of the pavement [7].
(1) Point cloud measurement
The line-structured light method can be used to restore the 3D section of pavement.
One camera and one line laser are fixed in a shield to ensure no relative displacement
under dynamic conditions. The line laser is projected vertically to the pavement. The
camera and the laser form a certain angle to collect the laser line image. The built-in
algorithm extracts the laser line and outputs the position of the laser line in the image.
According to the internal calibration parameters of the camera, surface elevation can
be calculated with the measured position in the image, and the elevation measuring
result is demonstrated in Fig. 2.28.
Adjusting the imaging angle between the camera and the line laser can result
in different measuring accuracies, as shown in Table 2.6. The measuring range of
300 mm is above the required 200 mm, and the measuring accuracy of 0.15 mm is
larger than the required 1 mm. This accuracy fully satisfies the detection of pavement
cracks wider than 1 mm.
The 3D line scanner is installed at the rear of the vehicle. Two sensors and one
controller are installed at a height of 2150 mm above ground. The view of the two
scanners can fully cover the pavement with a 4000 mm width, as shown in Fig. 2.29.
Geometrically, pavement cracks can be viewed as a large downward deviation
from the fitting curve (reference contour), and the pavement deflection is an upward
or downward deviation from the original pavement (reference contour), as seen from
the section of the pavement in Fig. 2.30. These two pavement distresses can be
differentiated using the 3D pavement section acquired by the 3D line scanner.
Line scanner 3D pavement measurement is a newly developed road surveying
technology. The high-precision 3D model of the pavement can support multipurpose
road surveying. Previous studies have demonstrated that 3D pavement data can be
2.2 Road Transportation Infrastructure Surveying 81

Fig. 2.28 Principle and sensor of line scanning 3D measurement

Table 2.6 Line scan sensor parameters


No Height/mm Angle/mm X-distance/ Z-distance/ X-accuracy/ Z-accuracy/
mm mm mm mm
1 2150 6° 2018.13 300.02 0.99 0.15
2 8° 2018.13 224.37 0.99 0.11
3 10° 2018.13 178.83 0.99 0.09

Fig. 2.29 3D measurement scheme and typical data of line scan

used to detect surface distresses, such as rupture, rutting, structural depth, worn-
out, and bumps. 3D measuring technology significantly improves the efficiency of
pavement surveying and the ability to detect more types of pavement distress. It has
gradually become an irreplaceable method in many application scenarios, especially
those requiring large fields. However, 3D pavement profiling mainly reflects the
elevation change, and it cannot be used to effectively extract small cracks, shallow
cracks, etc. Currently, 3D cameras are not well developed; thus, there are few options
for use in surveying cars. In addition, the expense for the 3D camera needs to be
82 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.30 Illustration of pavement profile, control profile, standard contour, support points

lowered to be widely applied in practice. Therefore, further development of 3D


cameras and more research on related data processing are needed.

(2) Point cloud preprocessing

Position information is important for restoring continuous 3D pavement. Generally,


the positions of the sensors are measured at each imaging position, and the positions
are important to restore consistent pavement, which is formed by connecting each
frame of 3D data after position calibration. The sensor is installed at a height of H
above the ground, and the light of the sensor is incident to the ground with an angle
of α in the plane formed by the cross-section direction and the elevation direction.
During the surveying, the incident angle is α ' , and the height of the sensor is H ' .
Then, the elevation of the pavement at this given time can be calibrated as Eq. (2.58)
with the pavement elevation z i , at the origin point.

z i' = z i + ki tan(α ' − α) + (H − H ' ) (2.52)

where ki is the distance between adjacent sampling points along the cross-section
direction, which is converted according to the calibrated relationship between the
image and object.
Abnormal signals can be triggered when uneven pavement is caused by water
and oil stains. The abnormal signal appears to be a high-frequency pulse signal with
a high change amplitude. Median filtering can be used to find the reference curve.
Then, the deviation from the reference curve can be calculated, and large deviations
can be obtained. These abnormal deviations can be replaced by neighboring normal
points, removing their influence on crack detection.
2.2 Road Transportation Infrastructure Surveying 83

(3) Pavement crack analysis


Road cross-section scanning data contain pavement reference contour, texture,
distress, attitude, and noise. Signal analysis in the frequency domain is used to extract
the pavement section, which is used as a reference to detect suspected pavement
cracks. The detection process is illustrated in Fig. 2.31.
In the space-time domain, pavement texture, cracks, and reference contours are
aliased, leading to difficulty in extracting and locating pavement cracks. In the
frequency domain, they can be easily separated, as the texture and crack correspond
to the high-frequency part of the spectrum, while the reference contour corresponds
to the low-frequency part. Figure 2.32 shows the amplitude spectrum and power
spectrum of the signal after the fast Fourier transform (FFT).
The road reference contour corresponds to the low-frequency signal after FFT. It
can be restored after applying the bandpass filter to the spectrum and the inverse fast
Fourier transform (IFFT). The intercepted low-frequency signal range is 0–fs1, and
the extracted reference contour is shown in Fig. 2.33.
Figure 2.33 shows several examples of road reference contour restoration using
different cutoff frequencies. As seen in Fig. 2.33a, when the cutoff frequency fs1
= 0.003 fH, the restored reference contour is very smooth, and at some positions,
the actual cross-section largely deviates from the restored signal. In the second case,
when fs1 = 0.015 fH, the restored reference contour is more in line with the pavement
section, as shown in Fig. 2.33b, c. The difference between these two cases is that
the signal of pavement cracks remains in the former case, while it is attributed to the
road reference contour in the latter case.
Generally, pavement texture can be deduced by using filling material information,
which is known. After the road reference contour is separated and removed, the
measured road cross-section contains texture and cracks. With the prior known design
texture, most of the texture information can be removed from the section. However,
there is a difference between the actual texture of pavement and the design texture.
The pavement texture changes due to road rolling and abrasion during its use. The
pavement texture directly reflects the local fluctuation of the road cross-section.
The local elevation fluctuation of the road section is mainly caused by the pavement
texture. The elevation difference, |PP−CP|, between the preprocessed cross-section,

Fig. 2.31 The process of 3D pavement crack detection


84 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.32 Road cross-section before and after FFT

PP, and the reference contour, CP, reflects the texture distribution of the cross-section.
The mean value AvgTex and mean square error MSETex of the texture fluctuation
can be calculated using Eqs. (2.53) and (2.54).
The larger the texture and the greater the fluctuation, the greater the segmentation
threshold required to separate the cracks. Therefore, it is necessary to use a dynamic
and adapted segmentation threshold by analyzing the texture distribution of each
section. The threshold, T, can be calculated as Eq. (2.55), and then it can be used to
obtain the suspected cracks.

∑N
|PPi − CPi |
AvgTex = (2.53)
i=1
N

1∑
N
MSETex = (|PPi − CPi | − AvgTex )2 (2.54)
n i=1
2.2 Road Transportation Infrastructure Surveying 85

Fig. 2.33 Restoration of the road reference contour

T = AvgTex + kMSETex (2.55)

1, CPi − PPi > T


FlagCi = (2.56)
0, CPi − PPi ≤ T

where k(2 ≤ k ≤ 3) is the threshold coefficient.


For each road cross-section data point, the suspected crack points can be obtained.
With the movement of the platform and the sensor, suspected crack pixels from all
sections can be combined to derive a weighted map, in which each pixel value is the
probability of being a crack pixel. Then, the 2D image analysis introduced in the
previous section can be used for crack detection, and the result is shown in Fig. 2.34.
According to the above-mentioned method, the 3D scanning data are converted
to 2D images for crack detection, which is currently the mainstream method. With
the development of advanced techniques and methods for processing 3D data, pure
3D data processing has the potential to be a popular or even mainstream method in
the future.
86 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.34 The process of pavement crack detection

(4) Deflection detection

Pavement distresses, such as potholes, ruts, lumps, and subsidence, are sparsely
distributed and deviate upward or downward from the road reference surface. There-
fore, road reference contours should be obtained to extract suspected distresses,
forming binary images for the following processes, as seen in Fig. 2.35.
According to the deviation direction from the road reference contour, pavement
distress can be categorized into two types, i.e., upward and downward types. Lumps
are upward type. Subsidence, rut, and pothole damage is downward type. Assume
that upward and downward types do not appear in a single section. There exists an
enveloping curve that is similar to the current road reference contour. The directional
consistency between points with the reference contour can be used for determining
the distress type, and the steps are listed as follows.
➀ Key points with high curvature are extracted using the corner detection method.
➁ The upper and lower envelopes of the road cross-section data are calculated.

Fig. 2.35 Workflow of pavement deflection detection


2.2 Road Transportation Infrastructure Surveying 87

➂ The key points are labeled according to their distance to the upper and lower
enveloping lines. The key points close to the upper enveloping line are labeled
“1”, and the key points close to the lower enveloping line are labeled “0”.
➃ Evaluate the directional consistency of the upper enveloping line and downward
key points (keypoints marked with a value of “0”) and the lower enveloping line
and upward key points (marked with a value of “1”).
➄ Evaluate the confidence that the upper and lower enveloping lines are similar
to the reference contour. If the upper enveloping line has higher confidence,
the damage type of the current cross-section is determined to be downward. In
contrast, if the lower enveloping line has a higher confidence, the damage type
is upward.
According to the labels of the key points, the contour can be divided into segments.
After segmenting, calculate the direction Oi of segment i. Then, the sum, Si , of the
directional difference between each selected segment and other segments can be
calculated. The reference contour can be obtained by iteratively removing segments
with a larger overall direction difference until the overall difference reaches the
set threshold. Figure 2.36 shows the process of distress type determination and
cross-section reference contour extraction. Figure 2.36a shows the upper and lower
enveloping lines of the section. Figure 2.36b shows the labeling of the key points of
the cross-section. Figure 2.36c shows the determination of the section damage type.
Figure 2.36d is the reference contour restoration of the cross-section.
The depth difference between the reference contour and the original data can
be obtained after the cross-section processing is finished, and then a 3D pavement
distress map can be generated with continuous depth difference data in 3D. According
to a certain depth threshold, pavement distress feature points can be obtained, as
shown in Fig. 2.37b. Based on the regional aggregation and depth change continuity
of the deflection, the localization method detects adjacent damaged areas of the same
type (merging adjacent damaged areas and eliminating discrete small-area damaged
areas). After the damage location, location, area, depth distribution, and edge of the
pavement distresses are obtained, they are used to determine the distress type.
2. Rutting measurement

Rutting is the permanent indentation of wheels left by vehicles on the pavement


and is the main form of asphalt pavement deterioration affecting safe driving. It is
the manifestation of plastic deflection of the asphalt layer and subgrade. Measuring
rutting takes advantage of the road cross-section by comparing it with the models,
measuring the maximum rutting depth in the wheel track zone, and evaluating it
according to relevant regulations (Fig. 2.38).
Pavement cross-section measurement is mainly carried out by using a laser scan-
ning beam or line-structured light. By placing several lasers on a rigid beam, multiple
distances to the pavement are measured, and the cross-section is interpolated by curve
fitting. The maximum rutting may not be measured by the lasers, which results in
underestimation of the rutting. The method using line-structured light is based on
the principle of laser triangulation. The measuring system consists of a line laser
88 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.36 Schematic diagram of the standard contour extraction process

Fig. 2.37 Detection and classification of pavement distresses


2.2 Road Transportation Infrastructure Surveying 89

Fig. 2.38 Formation of rutting on pavement

and an arrayed camera. The camera is installed at a certain angle with the laser light
to obtain the ground image containing the laser line. The position at the laser line
footprint can be calculated from the image to obtain the road cross-section. The
actual elevation of the cross-section is obtained by using the relationship between
the image coordinate and the object coordinate. Then, the rutting depth is calculated
and evaluated according to regulations.
1) Rutting models
Rutting is one of the main types of pavement distress, which can be caused by heavy
vehicle load, channelized traffic, and inappropriate asphalt mixture material at the
design and construction stages. Rutting affects traffic safety, road service quality, and
service life and causes an increase in maintenance costs. Its detection is an essential
part of road inspection. Rutting can be divided into seven types according to different
pavement deformation shapes, as shown in Fig. 2.39.
In the figure, Rl and Rr are the maximum values of rutting in the left and right
wheel tracks. The rutting depth is the maximum of them, as in Eq. (2.57).

Rd = max{Rl , Rr } (2.57)

2) Matching of model
The rutting model is divided into seven standard models, and the seven models can
be simplified into two types: W-shaped and U-shaped models. The W-shaped model
can be further divided into two categories: convex and concave. The division between
concave W-shaped and U-shaped models is through a parameter, the elevation differ-
ence between the highest point and the lowest point. The model is determined to be
W-shaped when the elevation difference is above a certain constant; otherwise, it is
U-shaped.
The model matching algorithm is based on the transformed road cross-section
(excluding the marking and out-of-lane data). It has been concluded that rutting is
distributed in the range of 500–3000 mm of the pavement. Accordingly, ab in Fig. 2.40
is the effective rutting range, and the highest point c is determined within the range
of ab. Then, the lowest points d and e are determined in the left and right directions.
90 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.39 Typical rutting models

With d and e as the reference, the two highest points, f and g, to the left and the right
can be selected, respectively. These two highest points are connected as the reference
line fg. For all points pi in the section that are larger than the minimum value of the
endpoint of the reference line, the right-hand rule of vector multiplication is used
to calculate the relationship between fg and fpi . Therefore, we have the following
equation with f (x0 , y0 ), g(x1 , y1 ) , and pi (xi , yi ).

d1 = (x1 − x0 )(yi − y0 )
(2.58)
d2 = (xi − x0 )(y1 − y0 )
2.2 Road Transportation Infrastructure Surveying 91

Fig. 2.40 Concave W-shaped rutting model matching

If d1 > d2 , then fg is in the clockwise direction of fpi . If d2 > d1 , then fg is in the


counterclockwise direction of fpi . Otherwise, the two vectors overlap. If fg overlaps
fpi or both are in the counterclockwise direction, the model is W-shaped; otherwise,
the model type is judged as concave W-shaped (such as point c of Fig. 2.40).
3) Rutting depth calculation
After matching the model, the three highest feature points f , c and g can be found, as
shown in Fig. 2.40. When the model is a W-shaped model, equations of two straight
lines ( fc and gc) are established using the three highest points, and the sums of the
vertical distances from the rutting points to the straight lines, fc and gc, are calculated.

Ra p = max{h a1 , h a2 , . . . , h an }
(2.59)
Rb p = max{h b1 , h b2 , . . . , h bn }

Figure 2.39 shows concave W-shaped rutting. According to the rutting model
shown in Fig. 2.39, the rutting depth is calculated as the distance from two local
minima to the straight line fg, as shown in Fig. 2.41. { }
According to Eq. (2.82), the maximum rutting is R = max Ra p , Rb p . When the
model is U-shaped, only one reference line is needed, and the calculation method is
the same as that of the W-shaped model. Special attention is given to the fact that the
maximum rutting depth is not necessarily at the lowest point of the 3D cross-section.
3. Roughness measurement
Pavement roughness refers to the vertical deviation of the pavement surface from the
designed surface. It reflects the flatness of the surface curve of the pavement longitu-
dinal profile. It is an essential measure in pavement evaluation and final acceptance. It
directly affects driving comfort, safety, and pavement lifespan. Currently, pavement
roughness is usually evaluated by using the international roughness index (IRI). The
measurement and evaluation of pavement roughness are introduced as follows.
92 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.41 Results of maximum rutting depth

Pavement roughness is related to vehicle vibration, operating speed, tire friction,


and wear, which directly or indirectly affect the operating costs of the vehicle. It
also affects driving comfort; therefore, it is an indicator involving people, vehicles,
and roads. Pavement roughness is also an indicator affecting road service perfor-
mance. In 1960, a road test study carried out by the American Association of State
Highway and Transportation officials showed that approximately 95% of the road
service performance comes from the roughness of the pavement. Long-term pave-
ment performance (LTPP) studies have shown that pavement roughness, especially
initial surface roughness, will seriously affect road service life. There are various
definitions of pavement roughness from different perspectives, and thus far, there is
no uniform definition that is accepted worldwide.
Multi-laser profiler (MLP) measurement based on inertial reference is the prin-
cipal rapid roughness measurement. It calculates the IRIs for every 250 mm. This
equipment is developed and designed for highway inspection, in which measuring is
carried out at a high and constant speed. To adapt to actual traffic conditions in which
non-uniform speed occurs due to traffic jams, error sources, such as roughness wave-
length, measurement speed, and variable speed on the acceleration integral, need to
be further studied to improve various measurement methods and data processing
algorithms.

1) International roughness index

The international roughness index (IRI) was proposed by the World Bank in 1982
[15]. It is the cumulative displacement measured by the suspension system of a
vehicle driving at 80 km/h on a known section after a certain driving distance using
the quarter-vehicle model. The IRI is a standardized measure of pavement roughness
that can be measured directly by using the elevations of the pavement longitudinal
profile or indirectly by using the suspension system.
The quarter-vehicle model, as illustrated in Fig. 2.42, refers to a dynamic model
consisting of an overspring mass, underspring mass, and spring-damping system.
The vehicle body and spring-mass are composed of the overspring mass (M b ), and
2.2 Road Transportation Infrastructure Surveying 93

Fig. 2.42 Quarter-vehicle model

the wheels, brakes, and suspension make up the underspring mass (M w ). The over-
spring and underspring masses vibrate in the vertical direction while the vehicle is
moving. Their movements are denoted as xb and xw , respectively. The rigidness of the
overspring mass is denoted as K s , and the rigidity of the underspring mass (wheels)
is K t . The damping of the overspring mass is C, and the elevation of the pavement
surface is xt .
To measure the IRI by using the suspension system, four variables need to be
measured with two sensors for recording the position changes and two sensors for
recording the accelerations of the overspring and underspring masses. These four
variables are demonstrated in Eq. (2.60).
⎧ '
⎪ z1 = xb


⎨ z' = xb'
2
(2.60)

⎪ z 3' = xw

⎩ '
z4 = xw'

The average slope at the first 11 m is used as the initial slope in the equation.

⎪ ' ' ya − y1

⎪ z 1 = z 3 = 11


z 2' = z 4' = 0 (2.61)




⎩ a = 11 +1
dx
where ya is the ath elevation point of the profile, y1 is the first, and dx is the sampling
interval.
94 2 Structural State Surveying for Transportation Infrastructure

Therefore, for a sampling interval of dx = 0.25 m, the difference between the


45th and the first elevation points is used as the initial slope for the IRI calculation as
Eq. (2.61). The following four recursive equations are then solved for each elevation
point.

⎪ z1 = a11 z 1' + a12 z 2' + a13 z 3' + a14 z 4' + b1 Y '


⎨z = a21 z 1' + a22 z 2' + a23 z 3' + a24 z 4' + b2 Y '
2
(2.62)

⎪ z = a31 z 1' + a32 z 2' + a33 z 3' + a34 z 4' + b3 Y '


3
z4 = a41 z 1' + a42 z 2' + a43 z 3' + a44 z 4' + b4 Y '

where
(Yi − Yi−1 )
Y' = , i = 2, 3, . . . , n (2.63)
dx

z 'j = z j , j = 1, 4 (2.64)

where z j is calculated from the previous section.


The matrices S and P are the model parameters, which are related to the profile
sampling spacing and the simulation speed. When the sampling spacing is 0.25 m
and the measurement S is 80 km/h, the values of S and P are taken as follows.
⎡ ⎤
0.9966071 0.01091514 −0.002083274 0.0003190145
⎢ −0.5563044 0.9438768 −0.8324718 0.05064701 ⎥
S=⎢
⎣ 0.02153176
⎥ (2.65)
0.00212673 0.7508714 0.008221888 ⎦
3.335013 0.3376467 −39.12762 0.4347564

P = [0.005476107 1.388776 0.2275968 35.79262] (2.66)

For each position, the corrected slope of the calculated profile is calculated using
Eq. (2.67).

RSi = |z 3 − z 1 | (2.67)

The IRI statistic is the average of the corrected slopes over the measured length.
Therefore, after solving the above equation for all contour points, the IRI is calculated
as Eq. (2.68).

1 ∑
n
IRI = RSi (2.68)
n − 1 i=2

where IRI is the international roughness index. RSi is the correction slope, and n
is the number of elevation measurements. Therefore, the IRI can be calculated by
2.2 Road Transportation Infrastructure Surveying 95

applying the measured elevation of the road longitudinal section to the IRI calculation
procedure.
2) Longitudinal section measurement

Elevation along the longitudinal section of the pavement is needed in the calculation
of IRI. An inertial laser profiler combined with a rangefinder and accelerometer can
quickly measure the longitudinal section. The laser rangefinder and accelerometer
are usually mounted integrally on a rigid beam or on the chassis of the measurement
vehicle. If a rigid beam is used, it can be mounted on the front or rear bumper of the
vehicle. If the chassis of the vehicle is used, the profiler is usually mounted directly
in front of the rear wheels. The laser rangefinder measures the distance from the
sensor to the ground, and the accelerometer measures the vertical dynamic response
of the vehicle, whose travel distance and speed are calculated by using the DMI
measurements. The IRI value can be calculated once the pavement profile and the
corresponding traveled distance are obtained. Figure 2.43 shows ARAN and the front
profiler of ARAB.
The laser rangefinder measures the vertical distance between the vehicle body and
the ground at any point on the longitudinal section of the test road, and this measure-
ment is a combination of the longitudinal road elevation and the vehicle body bumps.
The measurement also requires the use of an accelerometer to measure the amount
of vehicle body deflection over bumps. In this way, the measured laser distance value
and the acceleration value are effectively fused to obtain the international flatness
index, as shown in Fig. 2.44.
The roughness index of the pavement is calculated cumulatively based on eleva-
tions Rt , on measuring points along the longitudinal section. The IRI is usually
measured every 0.25 m as a sampling point and every 100 m as a sampling section.
Assuming that the vehicle speed is V (m/s) and the laser sampling frequency is F,
the longitudinal elevation at each sampling point is calculated using Eq. (2.69).

Fig. 2.43 Surface roughness measuring apparatus


96 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.44 Results of pavement roughness calculation

1 ∑
m
Rn = Rt , n = 0, 1, . . . , 399 (2.69)
m t=0

where m is the number of signals collected in this sampling segment, m = 0.25F/V .


The international roughness index, IRI, is obtained by performing a difference
accumulation analysis on the obtained roughness array (R0 , R1 , . . . , R399 ).

3) IRI calculation

The IRI can be calculated directly with the longitudinal section elevation of the pave-
ment. Measurements of the inertial laser section measuring system are a combination
of the cross-section elevation and the vehicle bump. The longitudinal section eleva-
tion can be obtained by removing the vertical offset of the vehicle bump from the
measured value of the rangefinder.
The acceleration of the vehicle bumps up and down is measured by an accelerom-
eter, and the offset is obtained by quadratic integration of the acceleration. Assuming
2.2 Road Transportation Infrastructure Surveying 97

that the velocity of the bumps at the moment t1 is v(t1 ), the acceleration is a(t1 ), and
'
the offset is s(t1 ), and the velocity at moment t ' is v(t ).

t'
'
v(t ) = v(t1 ) + a(t)dt (2.70)
t1

The vertical offset of the vehicle bump at moment t2 is calculated using Eq. (2.71).

t2

s(t2 ) = s(t1 ) + v(t)dt (2.71)


t1

The accelerometer collects data at a fixed acquisition frequency; thus, its measure-
ments are discrete. Equations (2.67) and (2.68) are for continuous measurement, and
for the actual discrete data, the following equations should be applied.

a(n) = A(n) − g (2.72)

v(n + 1) = v(n) + a(n)∆t (2.73)

s(n + 1) = s(n) + v(n)∆t (2.74)

where n is the sequence number of the samples. A is the nth sample value acquired
by the accelerometer. g is the local gravitational acceleration, and ∆t is the time
interval between two adjacent samples.
Given the initial speed v(0), offset s(0), and local gravitational acceleration g,
the vehicle bump offset at any moment can be calculated from the accelerometer
measurements.
As the time used in the cumulative calculation is extended, a large cumulative error
can be generated. Meanwhile, roughness measurement is segmented and dynamic.
Therefore, the initial velocity v(0), offset s(0), and measurement interval gravita-
tional acceleration g are determined by interval segments during the calculation. As
the roughness is a relative index, the initial offset can be assumed to be zero.
Assuming that the accumulated offset of vehicle body bumps in the roughness
measurement interval segment is B, as in Eq. (2.75). B is related to the initial velocity
v(0), and the gravitational acceleration g in the measurement interval. Suppose there
is a difference between the set initial value v(0) or g) and their true value, then B will
be overestimated. Therefore, these two parameters are taken as unknown parameters
and solved using the optimization algorithm, in which the objective is to minimize
the cumulative offset B, as in Eq. (2.76).
98 2 Structural State Surveying for Transportation Infrastructure

N −1

B= |s(n + 1) − s(n)| (2.75)
n=0

N −1

min(B) = min |s(n + 1) − s(n)| (2.76)
n=0

The longitudinal section elevation of the road can be obtained by subtracting the
vertical offset of the vehicle from the measurement of the rangefinder, and then the
IRI can be calculated.

4. Appearance Surveying of Highway Tunnels

Tunnel distress refers to the appearance and structural and functional changes in the
tunnel lining, pavement, tunnel entrance, and ancillary structures due to physical
or chemical reasons. Generally, tunnel distress includes functional degradation of
ventilation, drainage, and lighting, lining cracks, falling blocks, and deformation
of tunnel surrounding rock and lining structures. Distresses such as cracks, frozen
water, and structural deformation, as demonstrated in Fig. 2.45, threaten tunnel safety.
Lining cracks are one of the phenomena that directly reflect the stress state of tunnel
lining structures, which can be divided into longitudinal, oblique and circumferential
cracks. In severely cold areas, the seepage water undergoes repeated freeze–thaw
cycles, causing frost heave between the lining and surrounding rock, leading to
further deformation and damage to the arch wall.
Traditional detection mainly depends on human visual inspection and manual
measurement with devices. It requires the professional skills of practitioners and

Fig. 2.45 Common defects in tunnels


2.2 Road Transportation Infrastructure Surveying 99

traffic control during surveying and measurement, as shown in Fig. 2.46. Human
subjectivity and safety risks are involved in the process, which limits the efficiency
and accuracy of the measurement.

1) Measuring method

The inner contour (section) of the highway tunnel is generally in the form of a
multicenter circle. The surveying equipment is set up on a vehicle driving in the
tunnel. Measuring work was completed two times while the vehicle was driving
along the two opposite directions. For each time of driving, the equipment measures
the right or the left half of the tunnel. In this setting, the sensor must be fully rotated.
Let us consider the vehicle driving in the second lane, and the tunnel height is H. The
sensor is installed at a distance of H 0 from the ceiling of the tunnel. The maximum
distance from the sensor to the wall lining is denoted as L, the camera focal length
as f , the camera field of view at this focal length as α, the angle between the laser
radar and the center point of the camera object as θ, and the distance between the
camera and the LiDAR as b, as shown in Fig. 2.47.
Figure 2.47 shows a schematic diagram of the field of view in the two cases of
camera No. 1 (with the smallest fixed angle of the camera) and camera No. n (the
largest fixed angle of the camera). For each camera, h is measured by LiDAR, and
b is precalibrated. Then, the object distance d corresponding to the camera can be
calculated using Eq. (2.77).

d= b2 + h 2 − 2bh cos θ (2.77)

Using imaging triangle AOB, the actual field of view AB covered by the camera
on the object side can be obtained as Eq. (2.78).

2d
AB = (2.78)
tan α
Assuming that the camera resolution is r × r, the physical accuracy of the object
space R, can be calculated using Eq. (2.79).

Fig. 2.46 Artificial tunnel detection


100 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.47 Schematic diagram of the imaging geometry of the multi-sensor surveying equipment
for tunnel surveying

AB
R= (2.79)
r
For example, for a camera with 2048 × 2048 resolution, when the resolution at the
object side is 0.25 mm, the actual field of view AB is 0.5 m × 0.5 m.
In engineering applications, the object resolution R is determined by the tunnel
surveying specification [16]. Combined with the size of the tunnel section, the number
of cameras needed, the appropriate installation angle of the camera, and the sampling
interval of the camera in the driving direction can be calculated. For example, for a
tunnel with an arc length of 8 m and each camera’s field of view covering an area of
0.5 m × 0.5 m, at least 16 cameras are needed. In fact, more cameras are needed to
measure because the measurement needs to maintain a certain overlap of adjacent
camera images. When the camera measures at a sampling interval of 0.5 m, it is
necessary to generate a trigger signal at an interval of 0.5 m to drive the camera to
collect data. Then, the image obtained in each measurement is in the area of 0.5 m
× 0.5 m, which is the data of different positions on the same section. However,
at a measurement speed of 60 km/h, there will be relative displacement between
the sensors, and the measurement data cannot be guaranteed to be the same as the
above results. The effective data of each measurement are usually smaller than the
theoretical area (0.5 m × 0.5 m), resulting in a phenomenon of missing measurements.
Therefore, it is necessary to install all sensors on a rigid sensor platform to reduce
the relative displacement between sensors and to ensure a certain overlap between
adjacent measurement sections by adjusting the trigger distance.
Sequential images are obtained for each measurement, which can be concatenated
to form a complete tunnel section along the driving direction. However, the imaging
area of each image is inconsistent, and there is a certain overlap between adjacent
images. The distance from the center of the LiDAR to the center point of the object
view (xi , yi ) , can be measured by LiDAR. With this, the internal parameters of the
2.2 Road Transportation Infrastructure Surveying 101

Fig. 2.48 Relationship between image coordinates and tunnel coordinates

camera and the measured distance, the view area can be calculated. Since both the
LiDAR and the camera are installed on a rigid platform, their relative relationship is
calibrated during installation, as shown in Fig. 2.48.
Figure 2.48 shows an example of a tunnel with an arc length of 8 m. The field of
view (FOV) covers an area of 0.5 m × 0.5 m, and at least 16 cameras are required
to capture the whole tunnel. The camera is fixed at an angle of βi to the horizontal
plane. The relationship between the imaging optical axis and the focus point (xi , yi ),
of the tunnel curve can be described by Eq. (2.80).

xi = di cos βi
(2.80)
yi = di sin βi + H

where di is the distance between the camera and the tunnel wall, and H is the height of
the central axis focus of the camera. The sectional coordinates of 16 camera centers
are determined, and the relationship between the optical axis of each camera and the
focal coordinates of the tunnel surface on the object side, as well as the fixed angle
and working distance, are determined. If the tunnel section is regarded as a plane
unfolded by a curved surface, the abscissa of the plane is the surface direction β of
the tunnel section, and the ordinate of the plane is the driving direction Z. The center
point of these 16 images is (βi , z 0 ), where βi depends on the angle between camera i
and the horizontal, and z 0 depends on the current coordinates of the driving vehicle.
102 2 Structural State Surveying for Transportation Infrastructure

To ensure the consistency of imaging during dynamic conditions, all sensors


are installed on a rigid platform, integrating multiple CCD cameras, multiple LED
auxiliary lights, LiDAR, multiple infrared cameras, and inertial units. It is considered
that all sensors have no relative movement to the platform. CCD cameras are used to
collect tunnel section image data for crack detection. An infrared camera is used to
acquire tunnel lining temperature data for water supply freeze damage identification.
LiDAR is used to acquire a 3D point cloud of the tunnel, with which 3D modeling
can be realized. Meanwhile, it also provides object distance data. The inertial unit is
used to collect data for positioning in the tunnel and the correction of the attitude.
With the integration of multiple sensors on the moving platform, fast surveying data
containing tunnel lining cracks, watering, and freezing damages can be acquired and
then processed with related processing software to identify each type of distress.
2) Processing method
The tunnel surveying equipment collects an image sequence, each image of which
is independently shot by multiple cameras. To ensure the continuity and integrity of
the coverage, there are overlapping areas between adjacent images. Misalignment, as
observed from the dislocation of cracks, integrated structure, and markers in Fig. 2.49,
can occur between adjacent images due to the installation error of multiple cameras
and the possible relative displacement between cameras.
For misplaced sequence images with relative displacement, the objects in the
images may appear abnormal, such as repetition or incompleteness. Especially for
the detection of tunnel defects, the assessment conclusion is affected by the size
and number of crack defects. The independent evaluation of a long crack and the
evaluation of multiple short cracks after segmentation will produce different evalua-
tion conclusions. Therefore, when there is relative displacement between the lining
sequence images, image stitching processing is required to eliminate the adverse
effects on the tunnel safety assessment. In addition, during the measurement and
driving process of the equipment, it cannot be guaranteed that the trajectory of the
centerline of the lane is an absolute straight line. The distance from each camera to the
lining changes dynamically, resulting in a dynamic change in the object resolution
of the lining image data. Therefore, it is necessary to calculate the quality parameters
of the image according to the camera parameters and the effective imaging distance
of the camera for cropping and retaining the data with the best quality as the final
data.
(1) Geometric positioning of sequence images
During dynamic surveying, multiple cameras are used to acquire sequential images.
Usually, sequential images are shot at different locations, angles, and distances with
different lenses. As shown in Fig. 2.50, when the vehicle is located at a certain position
(x0 = 0), multiple cameras independently collect the images of the curved surface
where the lining is located. Different areas of the curved surface are of different
distances to the camera. For instance, the side area is nearer, while the ceiling area is
farther away when the camera is set to the same parameter. Therefore, with the same
cameras, a series of sequential images with different coverages are obtained.
2.2 Road Transportation Infrastructure Surveying 103

Fig. 2.49 Misalignment of adjacent images showing dislocation of crack and objects

The imaging fields of two adjacent cameras, A and B, can be calculated using
Eqs. (2.76)–(2.79), as annotated by the red and blue boxes, respectively, as shown
in Fig. 2.50. The red and blue cross marks represent the imaging centers of camera
A and camera B, and they are denoted at (x A , x A ) and (x B , x B ), respectively. These
coordinates can be analytically calculated with the spatial coordinate at the starting
position of x0 as long as there is no large change in the tunnel section and no rela-
tive displacement between sensors. This criterion can be easily met because the 16
cameras are all fixed on the same rigid beam.
As the platform moves forward, the arch of the tunnel is continuously imaged by
LiDAR, and the 3D point cloud of the tunnel arch can be obtained, as denoted by
the red dashed line in Fig. 2.51. The tunnel arch can be expanded as the Y-axis, with
the moving direction of the platform as the X-axis to form a reference coordinate
system, whose origin is the intersecting point, (x0 = 0, y0 = 0) of the cross-section
of the starting point and the floor line of the tunnel arch. For camera A, its imaging
center is located at (x A , x A ), where x A is the distance between the camera center and
the origin point, and y A is the distance between the focus of the camera on the tunnel
104 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.50 Principle of sequential image acquisition with a multi-camera system

Fig. 2.51 Alignment principle for multiple acquired images

arch surface and the origin point. Similarly, the center point of the 16 cameras in the
reference coordinate system can be calculated.
As image distortion can occur at the image edge, the image quality in the center
is usually higher than that at the edge. Therefore, it is necessary to avoid using the
2.2 Road Transportation Infrastructure Surveying 105

Fig. 2.52 Geometric localization results of sequence images

image edge. The ratio of the applicable central image can be improved by making the
multiple cameras collinear and coplanar. In this case, the center points of cameras
are calibrated on a straight line, and then Eq. (2.81) can be derived. The acquired
image sequence is demonstrated in Fig. 2.52.

x1 = x2 = x3 = · · · = x16 (2.81)

Geometric positioning provides fast and highly reliable sequence image stitching.
However, the calibration error of sensors can lead to relative displacement error. This
error accumulates during long-distance dynamic surveying, leading to misalignment
of long cracks crossing multiple images. Therefore, it is necessary to perform image
registration by using feature matching within the overlapping regions.

(2) Feature matching in overlapping regions

The accuracy of image stitching based on geometric positioning depends on factors


including multi-sensor calibration, the rigidity of the sensor platform, and the accu-
racy of the combined navigation based on MEMS inertial components. Usually,
there is a registration error of 1–10 mm, leading to a stitching line between images.
As shown in Fig. 2.53a, image dislocation occurs at the seam line, and different
grayscales occur in the overlapping area.
Figure 2.53a demonstrates an example of image dislocation, in which a long crack
running through multiple images is divided into short crack segments. Figure 2.53b
shows calibrated images where the crack manifests as a whole. The grayscale in the
106 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.53 Image dislocation caused by geometric parameter calculation error

overlapping area is visually consistent without any seam line, which can be caused
by image overlapping. Cracks are evaluated by length, width and number according
to the technical specification for tunnel assessment. Therefore, the cases in (a) and
(b) will lead to different crack evaluation results. The former, which does not confirm
the real situation, should be calibrated using the correct image geometric positions.
The main manifestation of the image dislocation is the interruption of typical
targets, such as cracks within the overlapping area, resulting in different coordinates
of the same target in the two imaging coordinates, which correspond to p(x, y) and
p ' (x ' , y ' ) in image A and image B, respectively. This deviation originates from errors
in the geometric positioning, one of which is the field of view error. As shown in
Fig. 2.54a, cameras A and B face the tunnel lining lane at a working distance of
H A , H B . The field angles of A and B are 2θ A and 2θ B , respectively, and their field
centers are denoted as O A and O B , respectively.
The field of view range of the cameras satisfies the trigonometric function
relationship. Equation (2.82) describes the relationship between cameras A and B.

|L A −R A |
tan θ A = 2H A
|L B −R B | (2.82)
tan θ B = 2H B
2.2 Road Transportation Infrastructure Surveying 107

Fig. 2.54 Image coregistration between adjacent images for tunnel lining

where (L A , R A ) and (L B , R B ) are the field of view ranges of cameras A and B,


respectively. If the working distance of the camera H, is 5 m, the measurement error is
δ H = ±10 m. The field of view range is 0.5 m, and the error of the field of view angle
θ is negligible. Then, the resulting field of view range error is δ|L − R| = ±1 mm.
Another error source is the positioning error of the view center. As shown in
Fig. 2.54a, there is an error δ O B between the actual center O B' and the theoretical
center O B of camera B when taking camera A as the reference.

δ O B = |O B − O B' | (2.83)

This center error causes a translational dislocation of the image shown by the
red dashed line in Fig. 2.54. This error needs to be dealt with in the solution of the
overlapping boundary and the target coordinates. The overlapping boundary can be
obtained by using the position of L B in image B, L B|A and the position of R A in
image B, R A|B , which satisfies the relationship described in Eq. (2.84).
⎧ ( )
⎪ RB − L B

⎨ B|A
L = O A + |O A − O B | − |
2
( ) (2.84)

⎪ R − LA
⎩ R A|B = O B + |O B − O A | − | A
2

To prevent loss of information, the maximum value of δ O B is used as a buffer


near the overlapping boundary, and the calculated overlapping area is presented in
Fig. 2.54b.
108 2 Structural State Surveying for Transportation Infrastructure

The positioning coordinates of the same target in image A, p(x, y), and in image
B, p ' (x ' , y ' ), should be equivalent as described in Eq. (2.85).

p(x, y) = p ' (x ' , y ' ) (2.85)

In other words, p1 and p1' , p2 and p2' , . . . , pn and pn' are the same point projected
in different images. Based on their equivalence, if p1 , p2 , . . . , pn are located in
image A and the corresponding p1' , p2' , . . . , pn' are located in image B, the transfor-
mation between the two images can be modeled based on these point pairs. This
image registration process is categorized as feature matching methods. Speeded up
robust features (SURF) is a typical algorithm in this category. SURF features are
scale-invariant, rotation-invariant and robust features. Image registration based on
the SURF feature is usually realized in the following four steps.

➀ Feature point extraction based on the Hessian matrix

A Hessian matrix can be obtained from each pixel as Eq. (2.86).


⎡ ⎤
∂2 f ∂2 f
⎢ ∂ x 2 ∂ x∂ y ⎥
H[ f (x, y)] = ⎢
⎣ ∂2 f ∂2 f ⎦
⎥ (2.86)

∂ x∂ y ∂ y 2

where the function value f (x, y) is the gray value of the pixel at (x, y) in the image.
The discriminant of the Hessian matrix can be calculated as Eq. (2.87).
( )2
∂2 f ∂2 f ∂2 f
∆H = − (2.87)
∂ x 2 ∂ y2 ∂ x∂ y

It can be seen from Fig. 2.55a that the value of the Hessian matrix discriminant
is larger near the crack in the lining image. The key point is selected as a local
maximum in the Hessian matrix, which manifests as brighter or darker tone than the
surrounding pixels, as highlighted by the red arrow in Fig. 2.55.

➁ Feature descriptors generated from a scale space

Multilayer filters are applied to drive feature maps at multiple scales. For each layer,
a 4 × 4 rectangular area block around the feature points is extracted. Based on this
image block, the Harr wavelet feature is calculated to obtain the dominant orientation.
For each image block of the M × N pixel, the SURF descriptor can be calculated as
∑−1 d f x=M−1
x=N ∑ d f x=N ∑−1 d f x=M−1
∑ df
dx
, dy
, d|x|
, d|y|
which is reshaped as vectors of 4 × 4 ×
x=0 y=0 x=0 y=0
4 = 64dimensions.
2.2 Road Transportation Infrastructure Surveying 109

Fig. 2.55 The process of image co-registration between adjacent images for the tunnel lining

➂ Feature point matching through random sampling

With two overlapping images, two sets of feature points can be extracted. The
Euclidean distance of the feature descriptor is used to evaluate the matching degree of
these two images. If the number of feature points is m, there are m×m possible feature
point pairs. The random sample consensus (RANSAC) is used to select the optimal
matching pairs by iteratively selecting a random subset of n pairing relationships
from it. The probability that a single sample belongs to this subset is w = n/m 2 , and
the probability that all k samples belong to this subset is calculated using Eq. (2.88).

p = 1 − (1 − wn )k (2.88)
110 2 Structural State Surveying for Transportation Infrastructure

where p is the probability that the sample randomly selected from the data set in
the iterative process is the subset. The model is used to test all other data, and its
rationality is evaluated using the standard deviation.

1 − wn
SD(k) = (2.89)
wn
This process is iterated within a certain time. The model is updated when the
standard deviation is smaller than that of the last calculation. After completing a
fixed number of iterations, output the current model as the matching result.

➃ Registration between images based on the transformation matrix

The transformation between two adjacent images can be described using a model
with eight parameters as Eq. (2.90).
⎡ '⎤ ⎡ ⎤⎡ ⎤
x m0 m1 m2 x
⎢ '⎥ ⎢ ⎥⎢ ⎥
⎣ y ⎦ = ⎣ m 3 m 4 m 5 ⎦⎣ y ⎦ (2.90)
1 m6 m7 m8 1

where p(x, y) and p ' (x ' , y ' ) are coordinates of the same point in the two images.
After matching, the feature points theoretically satisfy the equation. With multiple
matched feature points, the parameter m 0 ∼ m 8 in the transformation matrix can be
solved.
Using the SURF feature, the generated feature points appear near cracks and
obvious ancillary objects, but fewer feature points can be selected near the smooth
lining surface. It shows that the local maxima obtained by the discriminant of the
Hessian matrix are sensitive to the local extrema, corners, and edges, which frequently
appear near cracks and ancillary objects. The Haar wavelet is more applicable in these
areas to extract directional features, leading to the dominant principal direction and
reliable SURF feature descriptors.
Based on the above feature points, random sampling is used to obtain a consistent
solution and calculate the matching model. As shown in Fig. 2.57, after feature
matching (connected by straight lines in the figure), the feature points (red circles) in
Fig. 2.57a and the feature points in Fig. 2.57b (green crosses) correspond to the same
target, indicating correct image registration between these two images (Fig. 2.56).
The tunnel positioning error mainly originates from the field of view range error
δ|L − R| and the imaging error δ O of the view center. The resulting structural
dislocation mainly manifests as image translation, accompanied by a small scaling
ratio. The matching lines plotted in Fig. 2.57 demonstrate the translation between
the two images, which is as expected. However, the line connecting the view center
and the central axis of the view is not in the same plane, indicating a dislocation of
the images in the driving direction.
2.2 Road Transportation Infrastructure Surveying 111

Fig. 2.56 Characteristic points and descriptions of fracture regions and appendage regions

Fig. 2.57 Feature point matching of cracked areas and ancillary areas
112 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.58 Misalignment of the image in the vehicle running direction

(3) Extraction of valid data


The main reason for the dislocation in the driving direction is that the pivot line of the
camera is not collinear with the central axis due to the error in camera installation.
Figure 2.58 shows the misalignment of the cameras. The pivot line of the cameras,
denoted as red dashed lines, should have been aligned with the central axis of the
tunnel. However, there is usually a deviation of several to tens of millimeters between
these two lines due to inaccurate installation. For each camera, the errors are denoted
as δ O1 , δ O2 , · · ·, δ On .
Taking image i as an example, the width of the field of view along the driving
direction is E i , there is a positioning error δ Oi between the central axis of the image
and the central tunnel line, and the actual distance from the edges on both sides to
the central tunnel line is L i , Ri . These variables satisfy Eq. (2.91)
⎧ Ei

⎨ L i = |δ Oi + |
2 (2.91)

⎩ R = |δ O + E i |
i i
2
where the width E i of the field of view is solved by the measurement principle,
and the positioning error δ Oi is solved by image registration. Image sides along the
driving direction are not collinear; thus, they are cropped. The cropping range is the
minimum value of the distance from the image edge to the central tunnel line, as
expressed in Eq. (2.92).

L cut = min(L 1 , L 2 , . . . , L n )
(2.92)
Rcut = min(R1 , R2 , . . . , Rn )

Figure 2.59 shows the valid section images obtained based on the registered
sequence images on the surveying section.
Cropping is an effective means to address overlapping areas between images. Due
to different camera parameters and shooting environments, the imaging quality of
different images in the overlapping area is different. In principle, images of higher
quality should be the first choice as the data source for overlapping regions. Usually,
2.2 Road Transportation Infrastructure Surveying 113

Fig. 2.59 Image stitching of the image sequence for the tunnel section

image quality Q i is used to quantify the sharpness of the image in the overlapping
area, which depends on two factors, namely, the theoretical value of the camera image
quality and the sharpness of the actual image.
In practice, images are affected by ambient light, illumination light, material
reflective properties, air impurities, etc., leading to unexpected image clarity, which
needs to be corrected. Image clarity is an essential index for evaluating image quality,
especially when there is no reference image. The acquired sections are ordered using
the time labels and combined with an interval of vt to form the complete tunnel
image data, as shown in Fig. 2.60. It should be noted that there may also be cracks
spanning in the same direction as the vehicle’s driving direction.

Fig. 2.60 Local data for


tunnel measurements
114 2 Structural State Surveying for Transportation Infrastructure

After the tunnel sequence images are registered and stitched, a complete tunnel
image is formed, which is called a tunnel panorama image. The image can be gener-
ated as a physical image as needed, or it can be the result of logical stitching of
multiple images. Panoramic images of tunnels are widely used in the detection
of surface distresses of tunnel linings. Deficiency detection and identification is a
necessary step during project evaluation and acceptance, which can be realized by
interactive or automatic algorithms.

2.3 Railway Transportation Infrastructure Surveying

Railway transportation infrastructure mainly includes normal railways, high-speed


railways, urban subways, and maglev railways, and their auxiliary facilities. Track
safety detection has become the top priority during railway operation and mainte-
nance. Traditional manual-based measurement methods have been unable to meet the
current complex measurement tasks such as line measurement, fastener detection,
and rail damage detection. It is necessary to develop new techniques and equipment
for the efficient measurement of railway tracks [17].

2.3.1 High-Speed Rail Track Surveying

The high-speed rail track structure mainly includes roadbeds, track slabs, fastener
systems, steel rails, and track auxiliary facilities. The content of high-speed railway
tracks mainly involves track irregularity detection, track fastener detection, and rail
surface fatigue detection.

1. Unevenness measurement

Evenness is an overall evaluator considering track geometry, size, and spatial posi-
tion. It is evaluated by the deviation of the actual coordinates of key points along the
track from their designed values. This deviation can be caused by errors in construc-
tion setting out and other external factors, resulting in defects in flatness, straightness,
smoothness, etc., also termed track irregularity. Many geometric parameters affect
the evenness of the track, such as gauge, level, superelevation, height, direction, and
triangular pits. Track evenness is very important to the safe operation of high-speed
railways, and smooth measurement technology and equipment need to be developed,
mainly including high-speed dynamic, absolute, and relative methods.
The high-speed dynamic surveying train integrates several technologies, including
positioning and attitude determination, visual measurement, and laser measurement.
Geometric parameters related to track evenness are measured and calculated by using
multiple sensors installed on the platform, axial direction, bogie, and wheels. The
platform is moving at a high speed on the track during measurement. Figure 2.61a
2.3 Railway Transportation Infrastructure Surveying 115

Fig. 2.61 Comprehensive surveying train for railways

shows the Japanese E926 surveying platform, which consists of six surveying cabins,
and its highest speed reaches 275 km/h. This surveying platform can not only obtain
geometric parameters of the track but also examine the communication signals,
train acceleration, wheel-rail force, and environmental noise. The German RAILAB,
as demonstrated in Fig. 2.61b, uses a non-contact laser, integrating a three-axis
gyro-stabilized platform with multiple sensors, including photoelectric sensors and
displacement meters. It inspects the track geometry, including the track height, level,
and direction. The maximum detection of the surveying train can reach 300 km/h.
Apart from high-speed comprehensive surveying trains, portable track inspectors
are the most commonly used technology for track alignment measurement, which
can be divided into absolute and relative methods. Figure 2.62 demonstrates two
portable track inspectors: Amberg GRP1000 and Trimble GEDO CE. The Amberg
GRP1000 is equipped with a core component, which encompasses a high-precision
automatic tracking total station, a track gauge sensor, a tilt sensor, and a DMI. High-
precision track inspectors are important equipment for lining surveying of tracks
and have been widely used during the construction, operation, and maintenance of
railways.

1) Measuring method

Track unevenness measurement was based mainly on string measurement but is


now primarily based on inertial measurement. Currently, multi-sensor equipment is
replacing single-sensor equipment. Measuring equipment can also be divided into
contact and non-contact equipment, the former of which directly touches the rail
to obtain measurements, while the latter scans or takes a photo of the rail. The
core technology is inertial integrated navigation technology to accurately obtain the
position and location of the measuring points. With the sampling measurement, the
rail can be reconstructed with high-precision spatial positions. Based on this 3D
model, the unevenness can be evaluated. Figure 2.63 demonstrates an example of a
3D point cloud of the rail track obtained using non-contact measurement.
116 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.62 Portable track inspector

Fig. 2.63 High-precision 3D point cloud of the track

The high-precision point cloud contains absolute position information, which can
reflect the real situation of the scene. Based on the point cloud, the track geometry
parameters can be calculated by extracting or fitting key points.

2) Unevenness calculation

Along-track unevenness means the lateral unevenness of the inner side of the rail.
This unevenness results in lateral vibration and swaying of the train, accelerating
rail abrasion and degrading the fasteners. Lateral unevenness is divided into left-rail
unevenness and right-rail unevenness, and they are usually inconsistent. It can be
represented by the sine of different chord lengths and the spatial curve of different
2.3 Railway Transportation Infrastructure Surveying 117

Fig. 2.64 Measurement of track irregularity

wavelengths. The average value of the left and right rail unevenness indices is used
as the directional deviation from the central line of the track. The track unevenness
calculation is based on the horizontal coordinates at the sampling points of the track.
The unevenness at point B in Fig. 2.64 can be calculated using Eq. (2.93).

2S ABC 2 M(M − L AB )(M − L BC )(M − L AC )
a= = (2.93)
L AC L AC

where S ABC is the area of triangle ABC, and M = (L AB + L BC + L AC )/2.


Vertical unevenness refers to the fluctuation of the rail surface with regard to the
central line of the rail. This unevenness leads to excessive vertical force between the
wheel and rail, causing the train to fluctuate up and down. The vertical unevenness,
y, at the track mileage, x, can be calculated by the following equation.
[ ( ) ( )]
1 L L
y(x) = h(x) − h x− +h x + (2.94)
2 2 2

where h(x) denotes the height at the position and L is the chord length. A schematic
diagram of the vertical unevenness measurement is illustrated in Fig. 2.65.
The track level, also known as superelevation, refers to the elevation difference
between the left and right rail tops on the same section of the track relative to the
horizontal plane, as illustrated in Fig. 2.66. In some cases, when the train is on the
bend, the elevations of the left and right rail tops are designed to be different. In this
situation, superelevation is not a hazard factor.

Fig. 2.65 Schematic diagram showing the measurement of vertical unevenness


118 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.66 Schematic diagram showing the definition of superelevation

Distortion unevenness, also known as a triangular pit, refers to the pit deformation
of one track rail, as demonstrated in Fig. 2.67. Assume that there are four points, a,
b, c, and d, on the surface of the rail. If one point c is not on the same plane as a,
b, and d, there is a vertical distance from point c to the plane formed by the other
three points. The distortion is caused by the dislocation of point c. Then, the train
can only be supported by points a, b, and d, while it is suspended at point c. This
suspension can easily cause the train to fall off the track. The distortion unevenness
can be calculated using Eq. (2.95).

t = (Hc − Hd ) − (Ha − Hb ) = c2 − c1 (2.95)

Fig. 2.67 Measurement image of twist unevenness


2.3 Railway Transportation Infrastructure Surveying 119

where Ha , Hb , Hc , and Hd are rail surface elevations and c1 and c2 are superelevations.
The distortion is the superelevation difference between the two rail sections with base
distance L.
2. Fastener surveying
Fasteners are essential components connecting the rail track and rail bed, which
guarantees the safety of railway transportation. Worn-out of the fasteners reduces
the rigidity and safety of the track. Periodic state surveying and maintenance is an
important task for the railway management department. At present, defect detection,
size measurement, and tightness between the fasteners and the rail track must be
inspected for fasteners in service. Aiming at highly efficient surveying, multiple
sensors are used to obtain high-precision point clouds of the fasteners, based on
which their geometric parameters and tightness can be derived to evaluate their
safety state.
1) Point cloud measurement

An intelligent rail-fastener checker (iRC) uses a line-structured light laser to obtain a


high-resolution point cloud of rail fasteners. With the point cloud, the deficiency, size
and tightness of the fasteners can be detected and measured [18]. The iRC system
is designed as an I-shaped structure, as shown in Fig. 2.68, with rail wheels on both
sides. The rectangular box in the middle between the wheels is designed for installing
the power supply, control circuit, and embedded computer. A push rod is connected
to the middle box to push the device forward on the track.
There are four-line structured light lasers installed in the middle box of the iRC
system. In each frame, there are 600–800 points acquired by the sensors. The trigger
frequency is 1000–8000 Hz. The minimum and maximum scanning distances of the
laser are 155 mm and 445 mm, respectively, as demonstrated in Fig. 2.69a. These
distances correspond to the minimum and maximum scanning ranges of 110 mm
and 240 mm, respectively. Its repeatabilities in the Z-axis and Y-axis are 0.005 mm
and 0.06 mm, respectively. The distance between each adjacent frame point set in
the actual use process is 0.3 mm.

Fig. 2.68 Overall structure of the iRC system


120 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.69 3D point cloud acquisition of the iRC system

Figure 2.69b presents an example of a point cloud for the WJ-8 rail fastener
scanned using iRC. The fine 3D point cloud data of the inner and outer targets of
the left and right rails are obtained. The data are processed and analyzed in real-time
to detect the state of the fasteners, and defective and loose fasteners can trigger an
alarm that reminds the operators to carry out further checks and maintenance.

2) Installation examination

The installation examination of the fastener is mainly for detecting abnormal instal-
lation, such as missing spring bars and bolts, reversion, skewness, and rupture of the
spring bar. Abnormal installation is the basic examination for fasteners, which should
be carried out comprehensively and regularly. Figure 2.70 lists several abnormal cases
of the fasteners.
By projecting the point cloud coordinates of the fastener to the horizontal plane,
the depth map of the fastener can be obtained without losing information on the
fastener point cloud. This data organization structure can quickly locate and address
the point cloud, and Fig. 2.71 is the point cloud of the WJ-7-type fastener.
The integrity of the spring bar is examined in predetermined areas, as annotated
by the dashed line boxes A, B, and C in Fig. 2.71. The check box is predetermined
according to different types of fasteners. The installation state of the fasteners is
determined based on the point cloud in these boxes.
2.3 Railway Transportation Infrastructure Surveying 121

Fig. 2.70 Abnormal installation state of common rail fasteners

Fig. 2.71 The projected plan coordinate for the WJ-7 fastener
122 2 Structural State Surveying for Transportation Infrastructure

To rule out the point cloud in the detection area caused by the skew of the spring
bar, it is necessary to examine the straightness of the spring bar inside areas A and B
based on the point cloud. Taking the area of A as an example, the center of the point
cloud on the X-axis can be found by using the points with the same Y value based
on the least squares line fitting as Eq. (2.96).


⎪ y = ax
(n
+ )(b ) ( n )( n )

⎪ ∑ 2 ∑ n ∑ ∑

⎪ xi yi − xi xi yi


⎨a =
i=1 i=1 i=1 i=1
( n )2
∑ n ∑
n xi2 − xi (2.96)

⎪ ∑
(i=1n )(i=1
∑ ∑
)

n n

⎪ n xi yi − xi yi

⎪b = i=1 i=1
( n )2
i=1

⎩ ∑n ∑
n xi2 − xi
i=1 i=1

To evaluate the straightness of the spring bar, a straight line fitting correlation
coefficient r, is introduced as described in Eq. (2.97), where n is the number of
center points on the horizontal plane of the spring bar. If the fastener spring bar is not
skewed, the sum of the correlation coefficient r of the straight line fit and the slope of
the straight line fitted to the spring bars in the AB region will remain within a certain
threshold. For normal Vossloh-300 and WJ-2 fasteners, the spring bars are almost
perpendicular to the X-axis, resulting in an excessively large slope of the fitted line.
Therefore, the inverse of the slopes of the fitted straight line is used when judging
whether the fastener spring bar is skewed.
⎧ ∑
n ∑
n

⎪ (xi −x)− (yi −y)

⎪ r= / i=1
/
i=1

⎪ ∑ ∑


n
(xi −x)2
n
(yi −y)2
⎨ i=1 i=1

n
(2.97)


xi

⎪ x = i=1n




∑ n

⎩ yi
by = ni=1

The above-mentioned straightness examination cannot detect fasteners with


reversed spring bars. To detect whether the spring bar is installed in reverse, the
rectangular dashed box C, is selected to be located in the middle of the spring bar. If
the fastener is installed normally, the average height of the point cloud within area
C will be higher than the upper surface of the rail bottom by a certain threshold.
A top-down decision tree is constructed to classify different fastener states based
on the examination results on the bolts and spring bars. Figure 2.72 shows the work-
flow for fastener examination. The acquired point cloud of the fastener is checked
and classified to determine whether it is normal or not. If it fails at a certain stage,
the check steps in the child nodes will not be carried out, and the result of the current
node will be directly output as the final classification result.
2.3 Railway Transportation Infrastructure Surveying 123

Fig. 2.72 Work flow for classifying the fastener state

3) Geometric parameter measurement

Geometric parameters are essential features of fasteners, reflecting the errors that
may occur in production and installation. It also reflects the quality of the fastener
itself [19]. These parameters are calculated based on the key component extraction
following the point cloud modeling of the fastener.
Key components of fasteners, such as iron pads, anchor bolts, and spring bars, are
complex curved surfaces that are difficult to segment using model-based methods.
The segmentation method based on feature clustering requires a large amount of
calculation, which brings difficulties in the following real-time processing. Usually,
the region-growing method is suitable for segmenting the fastener point cloud. The
steps of point cloud segmentation based on the region-growing method are listed in
Table 2.7.
By selecting the seed points of the corresponding key components of the fastener,
the point cloud of the key component of the fastener is extracted. The annotated area
in Fig. 2.73 shows the key component area of the WJ-7 fastener extracted by the
method listed in Table 2.7.
Indirect measurement is based on the segmented key components and their struc-
tural relationship with the fastener. Here, the WJ-7 fastener is taken as an example
to explain the measurement of key geometric parameters. The key components are
labeled in Fig. 2.73.
124 2 Structural State Surveying for Transportation Infrastructure

Table 2.7 Region growing segmentation for the fastener point cloud
Input: Seed (SD)
Output: Point Set (Pn )
(1) Locate the seed point SD
(2) Check the 8-neighbor points of the seed point, if the height difference between this point and
the seed point is less than a set threshold δ, this point is added to the output point set Pn and
the boundary point set B p
(3) Select a point Bi from the boundary point set B p , and set the 8-neighbor points of this point
as Ti
(4) For each point in Ti , if it does not belong to the output point set Pn , and the height difference
between it and the selected point Bi is less than the threshold δ, it is added to the boundary
point set B p and the output point set Pn
(5) Remove Bi from the boundary point set B p
(6) Perform steps 3, 4, 5 until the boundary point set is empty
(7) Output Pn

Fig. 2.73 Key measuring area of the WJ-7 fastener

Taking the thickness of the iron pad as an example, as calculated by using


Eq. (2.98), other geometric parameters can be similarly calculated.

TP = h 3 − h 5 − Tpillar − TBuffer (2.98)


2.3 Railway Transportation Infrastructure Surveying 125

where TP is the thickness of the iron pad; h 5 is the height of the rail pad, which is
annotated as ➄ in Fig. 2.73; Tpillar is the height difference between the surface of
the two limit iron pad shoulders and the bottom surface of the iron pad, which is
35.1 mm for the WJ-7 fastener; and TBuffer is the thickness of the insulated bumper
plate, which is 6 mm for the WJ-7 fastener.
4) Tightness measurement

Tightness measurement refers to measuring the gap between the middle spring bar
and the upper surface of the rail bottom. It is an important indicator to show whether
the spring bar is loose, which can lead to insufficient fastening pressure and further
lead to rail displacement, skew, unevenness, and other diseases of the rail. Ignorance
of the loose fasteners can lead to the unstable train running and endanger the passen-
gers. We proposed a method for extracting the skeleton, namely, the axial line of the
spring bar from the point cloud of the fastener. Based on this skeleton, its tightness
can be measured. The essence of point cloud skeleton extraction is to convert the
3D discrete point cloud of a single object into a 1D continuous representation of the
object. Figure 2.74 shows the WJ-7 rail fastener spring bar, and the other four types
of rail fastener spring bars are similar in shape.
The rail fastener spring bar is formed by bending a cylindrical steel rod. Therefore,
the diameter of the fastener spring bar at different positions is the same. In this case,
the center of the circular section is the axial line of the spring bar. The point cloud
of the spring bar can be extracted by the region-growing method. This is the same as
the annotation process of the track fastener point cloud segmentation data set. It can
also be directly segmented through the trained deep learning point cloud semantic
segmentation network.
The optimal cross-section of the spring bar at one point is the cross-section perpen-
dicular to the tangent line at this point. The normal vector of the optimal cross-section
at one point can be calculated by the normal vector of the neighboring points as
Eq. (2.99).

Fig. 2.74 Spring bar of the WJ-7 type fastener


126 2 Structural State Surveying for Transportation Infrastructure
| |
| m |
|∑ |
X = arg min|| n( p j )||, p j ∈ Ni (2.99)
|X |=1 | j=1 |

where X is the optimal cross-section at point i, Ni is the neighboring point of point i


in the point cloud, p j is any point in Ni , and n( p j ) is the unit normal vector of point
p j . X can be obtained by calculating the eigenvector corresponding to the minimum
eigenvalue of the matrix A AT , A = [x1 x2 . . . xi y1 y2 . . . yi z 1 z 2 . . . z i ], xi , yi , and z i
are the components of the normal vector of p j in the X-axis, Y-axis, and Z-axis
directions, respectively.
The top view of the optimal tangent plane normal vector of the point cloud of
the fastener spring bar is shown in Fig. 2.75. The red points are composed of the
point cloud of the spring bar. The blue arrows are the normal vectors of the cross-
section. The axial line of the spring bar is demonstrated in Fig. 2.76a. The blue lines
in Fig. 2.76b, c represent the top and side views of the axial line of the spring bar
after spline interpolation, and Fig. 2.76d is the side view of the resampled axial line.
With the installation of the spring bar, four areas can be supported by the fastener.
Two of which are at the two ends of the spring bar, holding on the bottom of the
rail. The other two areas are supporting areas for other rail components. The four
supporting areas can form a plane. The tightness of the fastener is evaluated by the
distance from the spring bar to this plane. The spring bar separation can be calculated
using the extracted axial line of the spring bar, as listed in the following steps.
(1) As shown in Fig. 2.77, the center points of the spring bars near areas A and B are
added to the point set P, which is projected to the xy and xz planes, respectively.
The projected points can be used to fit a line by using least squares line fitting.

Fig. 2.75 The optimal cross-section of the fastener spring bar (top view)
2.3 Railway Transportation Infrastructure Surveying 127

Fig. 2.76 The result of the axial line extraction from the point cloud of the fastener spring bar

Fig. 2.77 Calculation of the spring bar separation to the rail surface

The straight line of the two faces consists of the spatial fitting line AB of the
point set P.
(2) Obtain the lowest points of the other two support areas, as shown by points C
and D in Fig. 2.77. Point E in the figure is the midpoint of C and D. Point E and
line AB can form a plane, the reference plane for calculating the separation of
the spring bar.
(3) Calculate the closest distance from the middle point of the spring bar in the
spring tongue area, denoted as F in Fig. 2.77, to line AB. Then, the separation
of the spring bar from the rail is calculated by subtracting the cross-sectional
128 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.78 3D section of rails obtained by using line structured light

radius of the spring bar from the distance between point F and the reference
plane.

3. Surface condition surveying

1) Measuring method

Rail surface condition surveying examines the fatigue affecting the shape and the
surface. Line-structured laser scanning has been widely used due to its advantages
of high precision, large field of view, and straightforward information acquisition.
For each rail, two lasers are installed at the bottom of the surveying train, pointing
outside and inside of the rail. The measuring angle and distance of the lasers can be
adjusted to obtain the section of the rail when the train is moving forward, as shown
in Fig. 2.78. The rail fatigue is evaluated by using the acquired point cloud.
The 2D grayscale image of the rail surface is generated from the point cloud
acquired using line-structured light scanning. The collected high-precision 3D laser
point cloud can be projected along the direction of the rail. The coordinates of the
cloud point can be converted to an image plane with the intensity remaining at the
image tone.

2) 2D image detection

Currently, abnormal detection of the rail mainly depends on visual recognition. In


this section, a deep learning network based on U-Net that integrates the elevation
information is used to detect the abnormal state from the collected grayscale image
of the rail surface. U-Net has a simple structure and low computational cost and was
initially proposed to deal with medical images. It was later introduced and tested to
be effective in pavement distress identification.
The improved U-Net network is illustrated in Fig. 2.79. Different from the initial
U-Net, the improved U-Net adds a global context module, which significantly reduces
the computational cost of the entire network, balancing between the lightweight
2.3 Railway Transportation Infrastructure Surveying 129

Fig. 2.79 Improved U-Net network

network and effective global context modeling. The improved network is better in
terms of robustness and generalization ability.
Nonlocal networks share similar context information. To reduce the computational
cost, a shared lightweight global context GC module is introduced in the U-Net
network. At the same time, the bottleneck conversion module is replaced with 1 ×
1 convolution, and the parameter quantity is reduced from the original C × C to
2 × C × C/r . C/r denotes the hidden representation dimension of the bottleneck.
If the originally set parameter number is 32, then the current parameter amount is
1/16 of the original. This lightweight module can be flexibly inserted into different
locations of the network. The optimized module mainly consists of three parts. First,
the attention weight is obtained by the 1×1 convolution and the softmax. Second, the
global context information is obtained by attention pooling. Finally, the convolution
kernel W v with size 1 × 1 is used for feature conversion, and the converted feature
is aggregated on each query position.
The globalization module is shown in Fig. 2.80 and is defined as Eq. (2.100).
⎡ ⎛ ⎞⎤
⎢ ⎜ ∑
Np
eW k x j ⎟⎥
⎢ ⎜ ⎟⎥
z i = xi + W v2 ReLU⎢LN⎜ W v1 x j ⎟⎥ (2.100)
⎣ ⎝ ∑
NP ⎠⎦
j=1 eW k xm
m=1
130 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.80 Global attention


module ⊗ Matrix
multiplication; ⊕ Broadcast
element wise addition

(∑ )
where eW k x j / m eW k xm is the weight of global attention pooling;
W v2 ReLU(LN(W v1 (·))) is the bottle neck transformation, and W k and W v
are linear transformation matrices.
Figure 2.81 shows several examples of the abnormal rail state, including welds,
wavy wear, and abrasion. By comparing the detected result with validation data
collected on-site, this method can quickly and accurately identify various abnormal
states of the rail surface and output their mileage locations and types.
One of the severe fatigues on the rail surface is the peeling of the rail tread. The
developed network can also detect this type of fatigue. Figure 2.82 shows on-site
images of rail peeling. The surveying system can successfully detect peeling and
output its location, supporting the efficient management and subsequent repair of
fatigue.

3) 3D point cloud detection

As designed, the laser scanning plane should always be perpendicular to the rail
line. This is true when the platform is stable; however, the vibration of the platform
can lead to a skewing of the scanning plane of the laser. Therefore, the acquired data
using the line-structured laser should be corrected in this case. As shown in Fig. 2.83,
A1 indicates the axial plane of the rail. A2 is the scanning plane of the line structured
laser. A3 denotes the corrected scanning plane. X Y Z indicates the coordinate system
of the line-structured laser, and x yz is the corrected coordinate system. In practice,
2.3 Railway Transportation Infrastructure Surveying 131

Fig. 2.81 Grayscale image of rail fatigue

Fig. 2.82 Peeling of the rail surface and its grayscale image
132 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.83 Illustration of the


heading angle deviation
correction for rail section
modeling

the pitch angle deviation of the platform is minimal, and the heading angle deviation
is the main factor affecting the measurement results. The heading angle deviation
can be obtained from the trajectory of the platform and the attitude measured by the
INS.

(1) Point cloud registration based on the AICP algorithm

One of the preprocessing steps is the registration of the rail surface point cloud
with the standard rail model. The traditional iterative closest point (ICP) algorithm
works well in 3D point cloud registration. However, the point cloud of the rail can be
affected by fatigue in shape, which results in a large difference from the standard rail
section. This can bring about poor registration accuracy or even incorrect registration.
Therefore, it is necessary to improve the traditional ICP algorithm to adapt to the rail
point cloud.
The improvement of the ICP algorithm is realized in two aspects. On the one
hand, the initial parameters in the ICP algorithm are usually set as the calibration
parameters. However, in the long-distance high-density measurement, the attitude of
the sensor at a distance from the initial point changes mainly from the initial attitude.
As shown in Fig. 2.84a, the black point set is the standard section of the rail. The
red point set is the initial section, and the cyan point set is the 1200th section from
the initial section. It can be seen that the section contour changes little between the
cyan point set and the red point set, but the attitude changes significantly. In the
improved algorithm, the registration parameters of the previous section are used as
the initial parameters in the registration of the current section, which significantly
reduces the number of iterations for each section. On the other hand, the predicted
result is corrected recursively. Figure 2.84b demonstrates the standard section, initial
section, and the 900th section as black, blue, and red point sets, respectively. The
registration result is obtained recursively by using the previous adjacent section. The
error accumulates and becomes larger during recursive registration. The improved
2.3 Railway Transportation Infrastructure Surveying 133

Fig. 2.84 Illustration of registration error calculated by using the ICP algorithm

ICP algorithm adopts a Kalman filter to correct the predicted result obtained by the
recursion of adjacent sections so that the registration result converges quickly.
In addition, the point cloud accuracy of the rail section also depends on the registra-
tion accuracy of the point clouds obtained by the left and right lasers. An improved
ICP algorithm is used to quickly restore the observed section by registering the
left and right laser point clouds. Then, the rail fatigue is evaluated by comparing
the restored section to the standard section. In the ICP algorithms, the point pairs
between the source and target point clouds are determined first. The rotation and
translation matrices are constructed based on the point pairs using the least squares
estimation. The obtained matrix is used to transform the source point cloud into the
coordinate system of the target point cloud, in which the error between the trans-
formed source point cloud and the target point cloud is then calculated. If the error
is larger than a preset threshold, the above operations are repeated iteratively until
the error is smaller than the threshold. One disadvantage of the ICP algorithm is
that the iterative calculation can slow down the speed of the algorithm. The other
disadvantage is that it easily falls into a local optimal solution, which is unexpected.
The improved ICP algorithm is called the adaptive iterative closest point (AICP)
algorithm. The detailed steps are described as follows:
➀ Set the Kalman parameter, assuming that the mean square error of state estimation
is P, the system process noise covariance is Q, the measurement noise covariance
is M, and the measurement system parameter is H.
➁ Assuming that rotation and translation matrices between the (N − 1)th section
and the standard model are R N −1 and T N −1 , respectively, the rotation and the
134 2 Structural State Surveying for Transportation Infrastructure

translation between the (N − 1)th section and the Nth section, ∆R and ∆T can
be calculated using the ICP algorithm.
➂ The rail section is divided into head and bottom parts, which are the upper half
and the lower half. Calculate the transform parameters between the entire rail
section, the upper half, the lower half and the reference section. The distances
between these three parts and the corresponding reference parts are d, d HN , and
d BN . The final transform parameters of the rail section and reference parts are
selected correspond to the minimum value of these three distances, and the final
rotation and translation matrices are denoted as RCN and T CN , respectively.
➃ The rotation and translation matrices, R N and T N , between the predicted rail
section and the standard section can be calculated using Eq. (2.101).

R N = ∆R R N −1
(2.101)
T N = ∆RT N −1 + ∆T

The systematic error in predicting the Nth section can be calculated using the
equation.

P N /N −1 = ∆R P N −1 ∆R' + Q (2.102)

➄ Calculate Kalman Gain as equation.


( )
K N = P N /N −1 H 'inv H P N /(N −1) H ' + M (2.103)

➅ Update the system state and calculate R N , T N , and T N by using the equations.
( )
R N = R N + K N RCN − H R N (2.104)

( )
T N = T N + K N T CN − H T N (2.105)

( )
P N = I − K N H P N /(N −1) (2.106)

The registered data are plotted in Fig. 2.85.

(2) Rail fatigue extraction

After registration between the measured section and the standard section, the
matching error between these two-point sets is at the submillimeter level. Then,
the position of the suspected fatigue along the section can be accurately located by
direct comparison between these two registered sections. The geometric features,
including the position, width, and depth of the suspected fatigue, can be extracted.
The fatigue area can be derived by combining results from multiple continuous
processed sections. K-means clustering is used to connect the small fatigue area to
2.3 Railway Transportation Infrastructure Surveying 135

Fig. 2.85 Correction of left and right laser roll angles

a larger area and extract the length of this area. The detailed steps are described as
follows:
➀ Assume that the current section after registration is denoted as the point set
P = [P1 P2 . . . Pn ], n is the number of points, and the standard section is denoted
as the point set, Q s = [Q 1 Q 2 . . . Q s ] For each point Pi , taking its t-axis as a
reference, select a point in Q s that is temporarily closest to Pi as its corresponding
point to form the point set Q = [Q 1 Q 2 . . . Q n ] as the correspondent of P.
➁ Calculate the difference set between Q and P, that is, D = [Q 1 − P1 Q 2 −
P2 . . . Q n − Pn ].
➂ τ D represents the depth threshold of the rail fatigue and points with a differ-
ence greater than τ D are selected from P as the suspected fatigue point set,
C = { Dk |Dk >τ D , 1 <k<n} of the current section, where k is the number of
suspected fatigue points.
➃ Combine continuous sections to form a point set.
136 2 Structural State Surveying for Transportation Infrastructure

⎡ ⎤
PD1,1 PD1,2 · · · PD1,sk
⎢ PD2,1 PD2,2 · · · PD2,sk ⎥
PD = ⎢
⎣ ···

··· ··· ··· ⎦
PD N ,1 PD N ,2 · · · PD N ,sk

where sk is the number of suspected fatigue points in the sth section.


➄ Initialize the initial cluster number used in the K-means clustering according to
the number of fatigue areas in the combined point set. Divide the fatigue points
into m bounding boxes DB = [DB1 DB2 · · · DBm ], and calculate the radius Rm
and center Cm of each bounding box.
➅ Calculate the distance between adjacent bounding boxes. Two regions denoted
by the adjacent bounding boxes are merged if their distance is smaller than a
given distance threshold. The distance calculation between adjacent bounding
boxes and comparison to the threshold is repeated until the number of overall
bounding boxes remains unchanged.
The depth threshold τ D in step ➂ is an empirical value, which is usually determined
according to the service time and degree of rail abrasion. Generally, τ D is set to 1 mm
at the head of the rail and 0.5 mm at the bottom. Figure 2.86 shows the combination of
700 sections, in each of which the fatigue and normal areas are annotated as red and
blue points, respectively. The initial cluster number is set to three according to the
continuous fatigue point areas (as the required length of the fatigue area to remove
scattered points). The first clustering divides the fatigue areas into three areas in the
red ellipses. When setting the distance threshold as 30 mm, the left two fatigue areas
are classified into one cluster, as denoted by the green ellipses.

Fig. 2.86 K-means clustering of rail fatigue


2.3 Railway Transportation Infrastructure Surveying 137

Fig. 2.87 Diagram showing the classification of rail surface fatigue

(3) Rail fatigue classification algorithm

Geometric features, such as position, length, width, depth, area, and shape, are used to
classify fatigue. Different fatigue types have different features. The rail top surface is
the severe fatigue location where the train wheels directly touch the surface, causing
wear, wave wear, abrasion, and peeling of the rail. The fatigue on the rail top is also
mixed with rust. Rail fatigue usually causes erosion to the rail top surface, leading to
a lower surface than the standard surface, while rust causes a higher surface than the
standard surface. By comparing the height between the rail section with the standard
model, the fatigue area can be differentiated from the rusty area. The gradient of the
section can be used to distinguish between different fatigue types. Among the above-
mentioned four types, wear and wave tear are relatively smooth, while abrasion and
peeling feature sharper edges. In addition, wear and wave wear can be distinguished
by the peak and pit. Abrasion is usually large in aspect ratio, while peeling is small
in aspect ratio. Based on these differences, a decision tree, as shown in Fig. 2.87, is
used to classify common rail surface fatigue.

2.3.2 Subway Tunnel Surveying

Common subway tunnel fatigue includes structural deformation, slab misalignment,


cracks, water leakage, etc. Currently, 3D laser scanning is widely used in subway
tunnel surveying. It scans the measured object without direct contact and records 3D
coordinates and reflected intensity from the object’s surface. Then, the 3D informa-
tion of the measured object can be restored. The point cloud of the subway tunnel
can be obtained and filtered to calculate the horizontal convergence value, which
138 2 Structural State Surveying for Transportation Infrastructure

is complementary to the traditional monitoring data and used to evaluate the defor-
mation of the tunnel structure. At present, there are several mature subway tunnel
surveying systems, such as the GRP5000 system developed by Amberg, Switzer-
land, the SiTrack rail mobile measurement system by Leica, Germany, and the orbit
moving 3D laser measurement system (rMMS) by Wuhan HiRail Transportation
Technology Co., Ltd., China. These systems integrate 3D laser scanners, DMI, IMU,
GNSS, and other sensors. The surveying system can acquire a high-precision point
cloud of the high-speed railway or the subway tunnel during the movement of the
platform, whose speed can reach above 3.6 km/h. Based on the acquired point cloud,
the section and boundary of the tunnel can be restored, and deformation can be
measured (Fig. 2.88).
1. Measurement principle
3D measurement technology is a trending development direction. The mobile
3D laser measurement system integrates a GNSS/INS/DMI combined positioning
system, 2D section scanner, CCD camera, etc., which are synchronized in both the
space and time domains with the coordination of a multi-sensor synchronous control
unit. Full-section spatial data of the subway tunnel can be quickly collected with
the cooperation of multiple sensors. This system has received popularity in subway
tunnel inspection due to its high efficiency, high accuracy, good data quality, and
all-sidedness.

Fig. 2.88 Subway tunnel dynamic scanning system


2.3 Railway Transportation Infrastructure Surveying 139

The mobile 3D laser scanning system uses a 2D section scanner to perform 2D


section scanning. It collects 3D data of the surrounding space while the surveying
vehicle is moving forward. The measurement coordinate system takes the laser emis-
sion center as the origin, the forward direction as the Y-axis, and the rightward direc-
tion as the X-axis, as shown in Fig. 2.89a. According to the time difference t between
transmitting and receiving the laser and the speed of laser propagation C, the distance
from each laser point on the tunnel section to the coordinate center can be obtained,
as shown in Fig. 2.89b.
The principle of positioning and attitude determination in a tunnel environment is
illustrated in Fig. 2.90. Based on the extended Kalman filtering, the initial position
of the platform, its acceleration and angular velocity output by the IMU, and the
mileage and speed measured by the DMI are fused. Combined with the coordinates
of the control points, the moving least squares (MLS) method is used to calculate
the high-precision positions and attitudes. This method is especially useful because
there is no GNSS signal in the tunnel.
The tunnel scanning data are obtained in the laser scanning coordinate system. It
can be converted to the POS coordinate system, in which the data are in the format
of a point cloud. The conversion can be realized using Eq. (2.107).

Fig. 2.89 Principle and coordinate of the mobile 3D laser scanning system

Fig. 2.90 High-precision combined positioning and attitude measurement using a mobile 3D laser
scanning system
140 2 Structural State Surveying for Transportation Infrastructure
⎡ ⎤ ⎡ pos ⎤ ⎡ ⎤
X pos Xl dl sin θl
⎢ ⎥ ⎢ pos ⎥ pos ⎣
⎣ Ypos ⎦ = ⎣ Yl ⎦ + Rl dl cos θl ⎦ (2.107)
pos
Z pos Zl 0

where
⎡ ⎤
a1 a2 a3
= ⎣ b1 b2 b3 ⎦
pos
Rl
c1 c2 c3


⎪ a1 = cos φ cos κ − sin φ sin ω sin κ



⎪ a2 = − cos φ sin κ − sin φ sin ω cos κ


⎪ a3 = − sin φ cos ω




⎨ b1 = cos ω sin κ
b2 = cos ω cos κ



⎪ b3 = − sin ω



⎪ c1 = sin φ cos κ + cos φ sin ω sin κ



⎪ c = − sin φ sin κ + cos φ sin ω cos κ

⎩ 2
c3 = cos φ cos ω

where [ X lpos Ylpos Z lpos ]T is the translation vector between the origin of the laser
pos
scanning coordinate system and the origin in the POS coordinate system. R l denotes
the rotation between these two coordinate systems.

2. Construction index measurement

There are various subway tunnels, including shield tunnels, horseshoe tunnels, rect-
angular section tunnels, and quasi-rectangular section tunnels. Among them, shield
tunnels are the most widely used in urban subway rail transit. The shield tunnel is
mainly composed of a tunnel wall and a tunnel floor. The tunnel floor is the very
place for load-bearing and where the train track is installed. The tunnel wall is mainly
composed of slabs, where concave screw holes, evacuation platforms, pipelines, an
overhead contact system (OCS), brackets and firefighting pipelines are placed.
1) Convergence measurement

Deformation of tunnel convergence is usually caused by the disturbance of the


surrounding soil, the load of the ground building, poor quality of tunnel construction,
and vibration of the running train. Lateral convergence deformation is mainly caused
by ground ballast, lateral unloading, and poor construction quality. The deformation
starts from the initial stitching of the slabs during construction, or it can be elastic–
plastic deformation of the slab or the bolt after long-term use under various external
forces.
The center position of the slab is the location 300 mm from the left and right
edges. For slabs of 1.2 m or 1.5 m, the concave bolt holes on the slab can be removed
2.3 Railway Transportation Infrastructure Surveying 141

from the observed point cloud to obtain a relatively flat and smooth slab plane. The
plane shown in Fig. 2.91a should be analyzed with regard to the requirement that the
variation in the inner wall of the tunnel should be less than 3 mm.
The deformation of the shield tunnel appears to have different forms. The primary
two types are deformation under an up-down force and a left–right force. Figure 2.91b
demonstrates the situation in which the top extrusion results in a large force on the
ceiling and the floor of the tunnel, pushing the left and right sides of the tunnel
outward. The distance between the ceiling and floor decreases. When the deformation
exceeds the bearing range of the material, it will cause local deformation or slab
falling and even local collapse. On the other hand, the extrusion and deformation of
the left and right walls of the tunnel will lead to arching of the slabs. The distance
between the left and right walls decreases while the distance between the ceiling and
floor increases. It can cause the slab to fall as well.
The radial convergence deformation is an important index during tunnel surveying.
The convergence diameter of each tunnel section can be calculated based on the
collected point cloud. An ellipse model is used to simulate the radial convergence,
and the five coefficients of ellipse Eq. (2.108) can be obtained by using the least
squares method. Equations (2.109)–(2.112) are used to calculate the parameters of
the ellipse equation corresponding to each section, including its center, semi-major,
and semi-minor axes.

Ax 2 + Bx y + C y 2 + Dx + E y + 1 = 0 (2.108)

B E − 2C D
x0 = (2.109)
4 AC − B 2
B D − 2 AE
y0 = (2.110)
4 AC − B 2

Fig. 2.91 A close-up image of the tunnel wall and a diagram showing deformation of the tunnel
142 2 Structural State Surveying for Transportation Infrastructure

2(Ax02 + C y02 + Bx0 y0 − 1)


a2 = √ (2.111)
A + C + (A − C)2 + B 2
2( Ax02 + C y02 + Bx0 y0 − 1)
b2 = √ (2.112)
A + C − (A − C)2 + B 2

Based on the calculated parameters, an ellipse model can be constructed for the
tunnel. Taking the ellipse center as the origin and the X-axis horizontally pointing to
the right wall of the tunnel (the line pointing to 0° in Fig. 2.92), the tunnel section
can be placed in plane coordinates. The distance between the two intersecting points
of the X-axis (pointing to 0° in Fig. 2.92) and the tunnel section is the convergence
diameter. Due to the existence of various obstacles in the tunnel, such as pipelines
and circuit lines, at some sections, the intersecting points can be blocked during
surveying, which fails the calculation of the convergence diameter. In this case, we
set a range of ± 5° around the horizontal searching line and apply quadratic curve
fitting to the local point cloud. As shown in Fig. 2.92, arc AB is a local curve obtained
by fitting a quadratic curve to the points near the 180° direction, and the intersection
point with the negative X-axis is the intersecting point.
Deformation analysis of the tunnel section is the basic work for detecting the
overall deformation of the tunnel. This detection is based on measuring indices, such
as the horizontal axis and the tunnel ellipticity. The section deformation analysis
can find the deformation of the tunnel as a whole, mainly through the horizontal
axis and the tunnel ellipticity. The horizontal axis is an important sign of the results
of manual field verification. The point cloud scanned near the horizontal axis is
relatively dense, and the size of the horizontal axis can be estimated by fitting a
straight line. However, there are point clouds and no point clouds on the horizontal

Fig. 2.92 Diagram showing


the calculation of the
convergence diameter
2.3 Railway Transportation Infrastructure Surveying 143

Fig. 2.93 Slab fitting result

axis, so selecting a suitable range of point clouds to fit a straight line can better
express the situation of the horizontal axis. Figure 2.93 shows the slab fitting results.
The ellipticity indicates how close the tunnel section is to the ellipse, and the
greater the ellipticity is, the greater the overall deformation of the tunnel. When
calculating the tunnel ellipticity, the section near the ring seam is mainly selected.
Tunnel ellipticity can be calculated using Eq. (2.113).

a−b
et = (2.113)
D
where et is the tunnel ellipticity, a is the major axis of the ellipse, and b is the minor
axis of the ellipse. D is the design diameter of the tunnel.
To accurately fit the cross-section of the tunnel, an iterative ellipse fitting is
presented in this section. First, the collected section points after wavelet filtering
are fitted to an ellipse. The semi-major and semi-minor axes of the fitted ellipse
can be calculated by Eqs. (2.108)–(2.112). Then, for each section point, its distance
from the fitted ellipse is calculated. Points with distances greater than 0.01 m are
excluded. The remaining tunnel section points are iteratively processed by steps of
144 2 Structural State Surveying for Transportation Infrastructure

ellipse fitting and distance exclusion until no point is excluded. Finally, the ellipse
fitted at the last time is the tunnel cross-section. Figure 2.94a shows one instance of
ellipse fitting and distance exclusion. The red dots in the figure represent the excluded
points, and the green dots are one part of the fitted ellipse. The red dots in the figure
are the excluded points whose distance from the fitted ellipse is greater than 0.01 m.
The purple line is the major axis of the ellipse. The original design of this tunnel is
a circle with a radius of 2.7 m. The major axis of the fitted ellipse is 5.5704 m, and
the minor axis is 5.3356 m, showing an outward deformation along the major axis
of 0.1704 m.
Local deformation of the tunnel can be calculated after ellipse fitting. To obtain
accurate local deformation, a standard ellipse can be constructed according to the
fitted major and minor axes. The local deformation is calculated as the distance
between the point closest to the standard ellipse. The local deformation of the
tunnel at different angles is shown in Fig. 2.94b. It is calculated clockwise every
10° from − 10° to 190°. The red line segments are the auxiliary lines for measuring

Fig. 2.94 Slab fitting results


2.3 Railway Transportation Infrastructure Surveying 145

the distances from points to the standard ellipse. The maximum deformation of 0.17 m
occurs at the top surface of the tunnel.
2) Slab faulting measurement

The tunnel wall is composed of slabs, and the misalignment of these slabs causes
unevenness of the tunnel wall. Slab faulting can occur in two directions, namely,
longitudinal faulting and radial faulting. Low-quality control of slab manufacturing
and installation during tunnel construction can lead to misalignment of the tunnel
slabs. However, environmental or external causes during service are more dangerous.
For example, the load above the tunnel can add force to the tunnel top, resulting
in uneven convergence of the slab and ultimately leading to different longitudinal
displacements between the tunnel slabs. When the tunnel is in service, some small
faulting can be observed, but is not obvious. It is an indicator that the tunnel may not
be safe and needs monitoring and repair. Otherwise, slight faulting can develop into
severe faulting and ultimately result in unstable tunnel structures.
Slab faulting can be divided into inter-section and inner-section types. The former
indicates the misalignment of the slabs in the longitudinal direction (axial direction
of the tunnel). The latter implies the safety state of the tunnel in the cross-section
direction. The seam line between the slabs is not precisely a straight line. To avoid the
single-circle point cloud inside the seam, the point cloud is selected approximately
100 mm away from the seam line. This selection can truly reflect the state of the
slabs without involving points on the seam line and the concave screw hole area.
The tunnel section is a quasi-circle, which can be divided every 5° into 72
segments. The slab faulting is measured within each slab. Although the points on the
tunnel section are discrete, the point density is relatively high (reaching the centimeter
or even millimeter level). Therefore, the section point closest to the radial line at
each angle is resampled as the point at this angle. However, the single-section point
cloud can be easily influenced by brackets and pipelines, resulting in misjudgment.
To avoid misjudgment and accurately measure the faulting value, three consecutive
slab faulting measurements greater than 20 mm are needed to output one result.
Figure 2.95 shows an example of a tunnel slab faulting measurement.
Inner-section slab faulting is another type, which means the misalignment of the
slabs in the cross-section plane. To accurately detect the misalignment of each slab
in the cross-section, it is necessary to first segment the slabs to determine the ceiling
slab, adjoining slab, and standard slab. The segmentation is based on the grayscale
image formed by the intensity value of the scanned point cloud. The Canny operator,
followed by dilation, is used to extract the edges of the slabs. Finally, the Hough
transform algorithm is applied to identify the longitudinal seam line between different
slabs. After segmentation, the slab image is filtered, and full circle fitting is carried
out at the same time. The center of the fitted circle is the reference point. Then,
the edge points of any two adjacent slabs are extracted from the point cloud, and the
radial displacement between the two sets of edge points can be calculated, as shown
in Fig. 2.96, which represents the extent of radial slab faulting.
146 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.95 Measurement results for tunneling slab faulting

Fig. 2.96 Schematic


illustration showing the slab
faulting

3) Clearance measurement
The tunnel clearance limits the size of the carriage and ancillary equipment along the
tunnel, ensuring the safe operation of subways. In the operation stage of the urban
subway, the clearance measurement is mainly for the subway trains. It is carried out
to determine whether there are objects in the clearance area, that is, to determine
2.3 Railway Transportation Infrastructure Surveying 147

whether the scanned point is located within the clearance area. Figure 2.97 shows
the carriage clearance and the equipment clearance along the straight rail track. The
coordinate of each point is designed based on the coordinate system of the rail top
surface, which can be directly checked in the specification.
Clearance measurement can be converted to the determination of whether there is a
scanned point of the tunnel section located within the boundary of the clearance area.
This determination uses a method based on horizontal rays. Figure 2.98 demonstrates
the principle of this method. For each scanned point, a horizontal ray can be plotted.
Whether it is inside or outside the clearance area can be determined by counting the
number of intersecting points between its horizontal ray and the clearance boundary.

Fig. 2.97 The carriage and equipment clearance along the straight rail track for A1 type 1.
Calculation of vehicle contours; 2. Section vehicle gauge; 3. Line device boundary
148 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.98 Schematic


illustration showing the
principle of tunnel clearance
measurement

If it is an odd number, then the scanned point is within the clearance area, and it
intrudes into the clearance area; otherwise, the scanned point is outside the clearance
area.

2.4 Bridge Dynamic Deflection Measurement

Bridges, as vital transportation infrastructure, cross geographical barriers, such as


rivers and canyons, to ensure the connectivity of the transportation network. Bridges
suffer from fatigue, defects, and deformation during their lifetime, leading to degra-
dation of their maximum loading, which can turn into damage or even collapse when
developing to a certain extent, resulting in a considerable loss of human lives and
property. When a bridge is subjected to a large load or temperature change, its axial
line can deform in the plumb line direction, and the deformation is called bridge
deflection. Bridge deflection measurement is an important part of bridge structural
health monitoring. Accurate bridge deflection monitoring reflects the temporal vari-
ation in the bridge load and the cyclic pattern of bridge deformation, which can be
used to evaluate the safety state of bridges.
Methods for bridge deflection measurement are roughly divided into two cate-
gories: leveling and automation methods. Leveling can realize a measurement accu-
racy of up to 0.5 mm, reflecting the deflection of the main girders over a long period
by periodically measuring the elevation variation of several points on the bridge
deck. Data collection and processing are usually separated and require a long time,
which cannot realize real-time monitoring. Automated methods involve the use of
inclinometers, connecting tubes, GNSS, measuring robots, laser scanners, ground-
based interferometric radar (GB-InSAR), etc. Among these methods, inclinometer
can be used to obtain pointwise measurements at several locations along the bridge
to calculate the bridge deflection curve. The method using a connecting tube obtains
the bridge deflection by calculating the deformation value of the bridge deck by using
the liquid level in the connecting tube. A GNSS receiver station near the bridge is
2.4 Bridge Dynamic Deflection Measurement 149

needed as the reference when using the GNSS network. Real-time or post-processing
based on the differential GNSS method can be used to obtain the bridge deflection.
The surveying robot is driven by stepper motors and measures precise coordinates
of the key points on the bridge to calculate the bridge deflection. Alternatively, a
laser scanner can be used to obtain the bridge deflection. The laser emitting source
and target are installed on the bridge. Taking advantage of the collimation of the
laser light, the coordinates of the laser projection points can be used to accurately
derive the bridge deflection. The performance of laser scanning is affected by lens
distortion and atmospheric turbulence. Ground-based radar interferometry is based
on frequency-modulated continuous wave (FMCW), SAR, and radar interferometry,
which can obtain the vibration of bridges with high accuracy, non-contact and far
range.

2.4.1 Principle of Vision Measurement

The bridge dynamic deflection can be monitored using a machine vision-based


method. The infrared cameras installed at the piers point to the measurement target
installed at the main girders. The cameras are working simultaneously to measure
their distances to the target. The actual displacement of the measurement target in
the image plane relative to the reference target can be calculated according to their
geometric relationship, which is the real-time deformation at the measuring point on
the bridge slab.
The reference and the target in the bridge dynamic deflection measurement system
use 850 nm infrared LEDs to overcome the influence of ambient light. The reference
infrared LEDs are fixed at the relatively stationary reference point, and the target is
fixed at the point to be measured under the bridge plate. The reference and target
are imaged on the camera’s telephoto optical system image sensor, as shown in
Fig. 2.99. The camera sensor is fixed to a reference point, such as a cover beam, by
a stabilizing mechanism that keeps the position unchanged even under loads such as
heavy vehicles. The parameters of the camera and the image acquisition frequency are
adjusted to acquire images of the target part at different times. As the bridge structure
is deformed by temperature and load, the target will show different pixel coordinates
in the camera coordinate system. Using the embedded processing platform, image
recognition is performed, the center of the target is located, and the pixel coordinates
of the reference and the target are determined. Finally, according to the actual distance
represented by a single pixel, the relative spatial position between the target and the
reference is determined by the object-image relationship to calculate the deflection
value and realize the real-time monitoring of the dynamic deflection of the bridge.
150 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.99 Measuring principle of bridge dynamic deflection based on machine vision

2.4.2 Deflection Calculation

The raw image collected by camera photography needs to be processed to calculate


the real-time bridge deflection. The image processing flow is shown in Fig. 2.100.
First, image binarization and median filtering are performed to extract the edge
features of the spots of the reference and target in the image. Then, the coordinates
of the spots’ center pixels are calculated. Finally, the bridge dynamic deflection value
is calculated by using the object-image relationship.

1. Target center identification


First, the images captured by the camera are converted to grayscale images and
binarized using the following equation.

Fig. 2.100 Image


processing flow
2.4 Bridge Dynamic Deflection Measurement 151

255, f (x, y) ≥ T
G(x, y) = (2.114)
0, f (x, y) < T

where G(x, y) is the binarized image, f (x, y) is the raw image, x, y is the pixel
coordinates, and T is the set threshold for binarization. After image binarization, the
result is denoised using median filtering, and the pixels in the region are arranged
by gray level, with the middle value as the output object, to improve the extraction
accuracy of the spot contour.
The light spots extracted from the images appear to be elliptical rather than regular
circles. Therefore, ellipse fitting is used to find the pixel coordinates of the spot’s
center, which is calculated as follows.

f (x, y) = Ax 2 + Bx y + C y 2 + Dx + E y + F = 0 (2.115)

where A, B, C, D, E, and F is the elliptic equation coefficient, and A + C /= 0. Let


A + C = 1 so A = 1 − C. The elliptic equation coefficients can be estimated by
the least squares method. Then, the elliptic center (xc , yc ) can be calculated by using
Eq. (2.116).

xc = B E−2C D
4 AC−B 2 (2.116)
yc = B D−2 AE
4 AC−B 2

To obtain the elliptic coefficients, at least six points (xi , yi ), where i =


1, 2, 3, . . . , 6, on the elliptic contour are needed. These points are randomly selected
multiple times to fit the elliptical spot, which can be regarded as a random sampling
process to eliminate errors caused by the selection and to optimize the estimation
of the fitting coefficients. The sum of squares of the residuals between (xi , yi ) and
f (x, y) is denoted as Eq. (2.117).


n
(A, B, C, D, E, F) = (Axi2 + Bxi yi + C yi2 + Dxi + E yi + F)2 (2.117)
i=1

The partial derivative of the above equation is calculated as Eq. (2.118).

∂f ∂f ∂f ∂f ∂f ∂f
= = = = = =0 (2.118)
∂A ∂B ∂C ∂D ∂E ∂F
The extreme value corresponds to the situation when the residual sum of squares is
minimal. For each selection of the edge points, the sum of squared residuals between
the points and the fitted ellipse is calculated, and the selection corresponding to the
sum smaller than a preset threshold is output as the final result. The last fitting is
regarded as the optimal, according to which the center of the fitted ellipse can be
calculated and denoted by (x c , yc ).
152 2 Structural State Surveying for Transportation Infrastructure

Using Eqs. (2.116)–(2.118), the image coordinates of the reference center and the
target center are calculated, respectively.
2. Real-time calculation

With the spot center point of the target and the reference, the real-time deflection of
the bridge is obtained by calculating the relative center displacement between the
target and the reference. The calculation is performed as follows.
(1) According to the target center recognition algorithm introduced in the previous
section, the pixel coordinates of the image spot center of the reference and the
target, (xc , yc ), and (xc' , yc' ) are calculated respectively. √
(2) Calculate
√ the pixel distance between the two centers as ∆h = (xc − xc' )2 +
'
(yc − yc ) .2

(3) According to the known size of the image pixel, d, find the actual distance
between the center points of the two targets, which is ∆H = d∆h.
(4) Figure 2.101 shows the camera imaging geometry showing the object-image
relationship, which can be used to calculate the actual displacement of the
bridge deflection as Eq. (2.119).

L ' ∆H
∆y = (2.119)
f

where L ' is the distance from the target to the camera, and f is the camera focal
length.
(5) Considering that there is an initial displacement, ∆h ' between the target and
the reference, which should be removed in calculating the deflection, the actual
change value of the deflection should be instead calculated by using.

L ' (∆H − ∆H ' )


∆y = (2.120)
f

where ∆H ' = d∆h ' .

Fig. 2.101 Imaging geometry of the vision camera


2.4 Bridge Dynamic Deflection Measurement 153

By using relative measurements between the target and the reference, errors caused
by atmospheric changes and the imaging system can be canceled, improving the
measurement accuracy of bridge deflection.

3. Anomaly detection of bridge deflection

Anomaly detection is performed using high-precision and high-frequency bridge


deflection measurements. However, cyclic changes caused by environmental factors,
such as temperature and humidity, should be differentiated from the anomaly of
bridge deflection. A sliding window is applied to the bridge deflection measurements
for anomaly detection. Assuming the current moment t 1 , the deflection measurements
vt within the time window t 0 ~ t 1 are selected to form the data set V = {vt , t0 <
t ≤ t1 }. The mean value E v and variance σv of the data set V are calculated. The
deflection values exceeding [E v − 3σv ,E v + 3σv ] are identified as anomalous values.
The anomalous values can trigger the warning system, drive the camera on the bridge
deck to take pictures synchronously and send warning messages through the mobile
Internet.

2.4.3 Dynamic Monitoring of Bridge Deflection

A measurement camera and a reference are installed on two adjacent piers of the
bridge. A target is installed in the middle span to acquire the real-time deflection.
The system is set up as shown in Fig. 2.102. To validate the deflection obtained by the
vision measurement, a thin steel wire is installed at the target position, and the other
end is dropped down to the ground with a large weight. The height changes of the
weight above a reference point on the ground are measured using a high-precision
micrometer, which also reflects the bridge deflection.
Figure 2.103 shows the measurement results of both methods. When there is
no passing vehicle, the inherent deformation of the main beam is approximately
2.4 mm due to the influence of temperature variation, and the deformation of the
bridge surface is approximately − 6.0 to 4.8 mm due to the influence of the vehicle
load. When the vehicle is approaching the measuring point, the deformation of the
bridge at this point is upward in front of the vehicle due to its load, and a local
crest is formed. When the vehicle passes the measuring point, the bridge deck at
this point deforms downward by its load, forming a local trough. Therefore, the
deformation increases after a decrease when the vehicle passes. The displacement of
the main beam is small due to the lighter load of the small vehicle, as the deformation
is approximately − 3 to 0 mm immediately after 13:40. The heavy vehicle causes
larger deformation of the bridge, and the instantaneous deformation is approximately
− 6.0 mm immediately before 13:40.
A comparison between the results of the vision measurement and the micrometer
measurement shows that a similar trend is observed in the results from both methods.
The Pearson correlation coefficient is as high as 0.95. The mean value of the difference
154 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.102 In situ setup of the bridge deflection measurement system

Fig. 2.103 Bridge deflection measurements of the two methods


2.4 Bridge Dynamic Deflection Measurement 155

is 0.2 mm, and the medium error is 0.38 mm. This indicates that bridge deflection
measured by the machine vision method and the classical micrometer are highly
consistent and accurate.
It is possible to estimate the amount of vehicles running on the bridge with accurate
measurements of the real-time bridge deflection. Figure 2.104 presents the variation
in the bridge deflection from 21:00 to 22:00 on 2020-11-03 as an example. The
curve of bridge deflection remained stable at approximately − 2.00 mm from 21:00
to 22:00 when there was no vehicle passing. The high-frequency measurements are
useful in detecting the bridge deflection variation as the vehicle passes through the
bridge. The waveform in the deflection signal is extracted by a filtering algorithm,
and the number of vehicles is counted. The results show that a total of 95 vehicles
crossed the bridge during 21:00–22:00 on 2020-11-03, including 65 small vehicles
and 30 large vehicles.
The bridge deformation is not only affected by the vehicle load but also closely
related to environmental factors, such as the temperature and atmosphere. The influ-
ence of temperature and atmosphere changes on the bridge deformation can be found
by using the bridge deflection over a long period. Figure 2.105 demonstrates the
variation in the bridge deflection during one day. The bridge deflection changes
throughout the daytime. During the nighttime, the bridge deflection is small due to
low temperature, and the bridge deflection is approximately 2.05 mm for most of the
day. At noon, the bridge deflection gradually increased to approximately 3.0 mm. In
the afternoon, the bridge deflection started dropping to approximately 2.0 mm, and
during the evening hours, it fluctuated by approximately 1.8 mm.

Fig. 2.104 Variation in bridge deflection value in an hour


156 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.105 Variation in bridge deflection in a day

2.5 Surveying Equipment

2.5.1 Systematic Architecture of the Surveying Equipment

Dynamic and precise engineering surveying services for diverse projects to acquire
information on objects in various application scenarios. Professional knowledge and
requirements in specific projects, while fundamental methods feature common char-
acteristics in different applications. The base of dynamic and precise engineering
surveying needs to be extracted and concluded among diverse projects and consol-
idated to be the common cornerstone for future and new challenges. Systemati-
cally and logically clear equipment architecture and surveying workflows involving
multiple advanced techniques are the foundation for solving various challenges
related to dynamic and precise engineering surveying. This common foundation
and architecture can then evolve into diverse equipment and methods. This idea
is referred to as the mature von Neumann architecture, which facilitates the rapid
development of computers.
Several common features can be concluded and consolidated for dynamic and
precise engineering surveying as follows.
(1) First, it requires a large surveying area in a short period with as much information
as the objects. To meet these requirements, a moving platform with multiple
types of sensors is applied. The common mobile platforms are vehicles, ships,
airplanes, and robots.
(2) As the surveying platform is moving, the positioning method needs to be
dynamic. On the other hand, positioning and navigation support unified space
and time datum, which is important for integrating observations from different
sensors. Therefore, the positioning and navigation method should also be stable
during the movement of the platform.
(3) Different sensors often work in different protocols and modes. Synchronous
control of multiple sensors requires a standard interface and protocols.
(4) Sensors are usually sold with software. However, data collection and processing
can be very inconvenient with changing between different software, even maybe
different running environments. Therefore, professional and integrated software
for data collection and processing needs to be developed.
2.5 Surveying Equipment 157

Apart from the above-mentioned common requirements, professional surveying


equipment for specific applications should be able to meet the industrial standard.
Calibration and certification may be needed to meet the requirements for specific
industrial applications. Overall, a unified architecture can be established, as shown
in Fig. 2.106.
In this architecture, the software serves as a link to access multiple sensors, inte-
grating multiple sensors and outputting multi-source sensory data, with the unified
space and time datum as the core.
For instance, common features and similar design ideas can be found in both road
surveying equipment and rail surveying equipment. Both of them integrate sensors
such as panoramic cameras, multi-vision cameras, LiDAR, and line-structured light
to acquire object information. High-precision positions and attitudes are calculated by
combining observations from odometry, laser, vision, GNSS, and IMU. The control-
ling system generates trigger signals that are sent to sensors according to the posi-
tioning information. It drives the sensors to acquire data at the designed locations.
In the process, unified space and time reference coordinates are needed to coordi-
nate different types of data collection. The synchronization control device sends the
coordinated and unified data to the data acquisition software on the server through
a standard communication protocol. At the server end, the data port of each sensor
is listed, and the data sent through this port are fused with spatial and temporal
information and stored in the server, completing the data collection. The collected
data may need to be processed in real-time. With the development of 5G and future
6G communication technology, real-time surveying data transmission may be real-
ized in the future. Typical surveying equipment for highways, railway, dams, and
other infrastructures has been widely used in domestic highway transportation, rail
transportation, and water conservancy.

Fig. 2.106 Architecture of surveying equipment


158 2 Structural State Surveying for Transportation Infrastructure

2.5.2 Road Surveying Equipment

1. Pavement deflection surveying equipment

Laser dynamic deflectometer (LDD) is pavement deflection surveying equipment


developed by Wuhan Optics Valley ZOYON Science and Technology Co., Ltd.,
China. As seen in Fig. 2.107, the LDD can be mainly divided into a Swedish Volvo
heavy truck in the front and a trailer at the end. The maximum load on the trailer
is 10–13 tons, among which 5 tons can be detachable. One container is loaded on
the trailer and maintained at a constant temperature of 25 ± 2 °C to keep all the
surveying equipment in a normal state. To ensure consistent data collection with
different sensors, all sensors were mounted on a specially designed 4 m long rigid
beam, 2 m forward from the center of the load wheel. Four laser Doppler decimeters
are installed parallel to measure the pavement deformation velocities, which are
measured at 100 mm, 300 mm, 750 mm, and 3600 mm from the center of the load
wheel. An IMU and a GPS receiver are installed to obtain the positions and attitudes
of the platform. Temperature sensors are used to measure pavement temperature and
air temperature.
2. Road testing and measurement equipment

Intelligent road testing and measurement (RTM) system is intelligent road testing and
measurement equipment developed by Wuhan Optics Valley ZOYON Science and
Technology Co., Ltd., China. The whole series of RTM equipment provides indices
for road evaluation, such as crack, rutting, flatness, and profile depth along the road.
The platform is usually selected as a medium bus, which is adapted according to
specific usage. The power supply, control and storage systems are installed inside the
bus, while surveying systems for different indices are installed at different positions
outside the bus. A crack surveying unit with two 3D scanners is installed on the rear
top of the bus, 2150 mm above the ground. A flatness surveying unit is installed in the
right front of the two rear wheels. The longitudinal section of the left and right wheel
traces is measured to calculate the flatness and construction depth. Alternatively,

Fig. 2.107 Laser dynamic deflectometer (LDD)


2.5 Surveying Equipment 159

Fig. 2.108 Intelligent road testing and measurement (RTM) system

this unit can also be installed on the front or rear bumper bar of the bus. Multiple
sensors are installed on the left, middle, and right of the cross-beam to measure
multiple indices, including flatness, pavement mean profile depth, pavement fatigue,
and pavement bumping. The road facility surveying unit is a binocular measuring
system with a left and right camera and is installed on the roof of the bus. Figure 2.108
shows a photo of the surveying bus and several close-up photos of the installed
sensors.

3. Tunnel surveying equipment

Tunnel fast measurement system (TFS) is highway tunnel surveying equipment


developed by Wuhan Optics Valley Zoyon Science and Technology Co., Ltd., China.
The platform is a medium truck adapted according to specific usage, as shown in
Fig. 2.109. It has a sensory cabin and an operating cabin. The power supply, control,
and storage systems are installed in the operating cabin. Measuring sensors are
installed in the sensory cabin. The sensors are installed on a rigid bracket, which
can be rotated around the center. Tunnel surveying uses 16 high-resolution CCD
cameras and multiple LED auxiliary lighters. For each truck run, half of the tunnel
section is measured, and full-section measurement is realized by rotating the bracket.
Tunnel fatigue related to water leakage and freezing is measured by using an infrared
thermometer based on the temperature model of different fatigues. The 3D cross-
section of the tunnel is restored by using high-precision LiDAR scanning. It provides,
on the one hand, the point cloud of the tunnel cross-section, and on the other hand,
the distance information for the CCD camera to calculate the image parameters.
160 2 Structural State Surveying for Transportation Infrastructure

Fig. 2.109 Tunnel fast measurement system (TFS)

2.5.3 Rail Track Surveying Equipment

1. High-speed railway surveying equipment


High-speed railway surveying equipment was developed by Wuhan HiRail Trans-
portation Technology Co., Ltd., China. It carries out comprehensive inspection of
the high-speed railway infrastructure. As shown in Fig. 2.110, the platform is the
electric self-propelled cart. This equipment integrates multiple modules, including
track fastener fatigue surveying, track plate crack, and gap surveying, track geometry
parameter surveying, track surface state surveying, trackside equipment inspection,
clearance measuring, etc. With sensors, such as a 3D laser scanner, line structured
light, surface array camera, and line array camera, data of the rail tracks, fasteners,
track plate, tunnel section and trackside facilities are collected automatically. Related
indices are extracted from the data and analyzed to output the reports required for
the comprehensive inspection of high-speed railways.
All the data are unified in the same time and spatial coordinate system to form
visualized big data. On the one hand, it is regarded as a virtual model of the real scene,
supporting analysis in the office. On the other hand, big data can be used in machine
learning and intelligent analysis to obtain the pattern of various fatigue or damage
types, which helps to predict future hazards, assisting intelligent management and
operation of the infrastructure in service.
2. Subway tunnel surveying equipment
The rMMS is a subway tunnel surveying equipment developed by Wuhan HiRail
Transportation Technology Co., Ltd., China. It uses the rail mobile trolley as the
platform, as shown in Fig. 2.110. Apart from the platform, it combines POS, laser
scanner, synchronization control module, structured light module, wheel encoder
module, power control module, etc. Multiple sensors work synchronously to collect
the full-section spatial data of the tunnel under the coordination of the synchronization
control unit, as shown in Fig. 2.111.
2.5 Surveying Equipment 161

Fig. 2.110 High-speed railway surveying equipment

Fig. 2.111 Orbit moving 3D laser measurement system (rMMS)


162 2 Structural State Surveying for Transportation Infrastructure

The combined POS is mainly used to obtain the position and attitude of the rMMS,
which integrates GNSS receivers, IMU, and DMI. GNSS receivers mainly provide
precision timing to establish a synchronized multi-sensor time reference [20]. Since
tunnels are usually closed environments without GNSS signals, spatial position and
attitude are mainly acquired by dead reckoning based on IMU and DMI data. The
geometry and intensity of the tunnel section are surveyed by using laser scanning
data.

2.6 Summary

Transportation infrastructure is essential to the economic and social development of


every country in the world. Modern society increasingly relies on convenient trans-
portation infrastructure, leading to an increasing number and broader distribution.
However, the structure and appearance of infrastructure may experience varying
degrees of deformation and damage due to long operation years, changes in material
properties, geological movements, and other factors. This creates an urgent need
for technologies to measure, monitor, and evaluate the infrastructure condition. This
chapter summarizes the technology and equipment related to road subgrade and
pavement, highway and railway tunnels, track geometry, and bridge dynamic deflec-
tion measurement. Although different application scenarios require different equip-
ment, the core technology is versatile. For example, the linear structured light 3D
measurement can be used for inspecting road surface conditions and measuring track
geometric parameters. Multi-sensor integrated control and spatiotemporal datum
construction are common issues in many different scenarios.

References

1. Zhang D, Li Q, Cao M, et al (2015) Deflection measurement methods based on velocities of


pavement deflections. Journal of Shanghai Jiao Tong University 49(2): 220.
2. Boresi A P, Schmidt R J, Sidebottom O M (1985) Advanced mechanics of materials. New
York: Wiley.
3. Hunt G, Wood J (2005) Review of the effect of track stiffness on track performance. RSSB,
Research Project 372.
4. He L, Lin H, Zou Q, et al (2017) Accurate measurement of pavement deflection velocity under
dynamic loads. Automat Constr 83:149–162.
5. Zheng J (2012) Deflection design standards of asphalt pavement based on state design method.
China Journal of Highway and Transport 25(4):1–9.
6. Liao J, Lin H, Li Q, et al (2019) A correction model for the continuous deflection measurement
of pavements under dynamic loads. IEEE Access 7:154770–154785.
7. Zhang D, Zou Q, Lin H, et al (2018) Automatic pavement defect detection using 3D laser
profiling technology. Automat Constr 96:350–365.
8. Li Q, Zou Q, Zhang D, et al (2011) Fosa: F* seed-growing approach for crack-line detection
from pavement images. Image Vis Comput 29(12):861–872.
References 163

9. Zou Q, Cao Y, Li Q, et al (2012) Cracktree: Automatic crack detection from pavement images.
Pattern Recogn Lett 33(3):227-238.
10. Zou Q, Zhang Z, Li Q, et al (2019) Deepcrack: Learning hierarchical convolutional features
for crack detection. IEEE Trans Image Process 28(3):1498–1512.
11. Li Q, Zou Q, Liao J, et al (2019) Deep learning with spatial constraint for tunnel crack detection/
/Proceedings of the ASCE international conference on computing in civil engineering, Atlanta
GA.
12. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition//Proceedings
of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas NV.
13. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image
segmentation//Proceedings of medical image computing and computer-assisted intervention-
MICCAI, Munich.
14. Chen L C, Papandreou G , Kokkinos I, et al (2018) Deeplab: Semantic image segmentation with
deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern
Anal Mach Intell 40(4):834–848.
15. Sayers M W (1995) On the calculation of international roughness index from longitudinal road
profile. Transport Res Rec (1501):1–12.
16. Ministry of Transport of the People’s Republic of China (MOT) (2015) Technical specification
of maintenance for highway tunnel (JTG H12—2003). Beijing: China Communications Press.
17. Zhai W, Zhao C (2016) Frontiers and challenges of sciences and technologies in modern railway
engineering. Journal of Southwest Jiaotong University 51(2): 209–226.
18. Xiong Z (2017) Key technology research of urban rail service state inspection system.
Dissertation, Wuhan: Wuhan University.
19. Cui H (2019) Research on key technologies of service status detection of track fasteners based
on structured light point cloud. Dissertation, Wuhan: Wuhan University.
20. Li Q, Mao Q (2017) Progress on dynamic and precise engineering surveying for pavement and
track. Acta Geod Cartogr Sin 46(10):1734–1741.
Chapter 3
Dynamic Surveying in Autonomous
Driving

3.1 Overview

An autonomous driving car is an intelligent mobile system that replaces human


drivers using a sensing system and intelligent maneuvers. Autonomous driving can
largely reduce traffic accidents and jams and optimize energy usage. It is acknowl-
edged as one of the twelve high-techs dominating the future economy and gaining
more interest worldwide. Early autonomous driving dates back to August 1921, when
the first autonomous vehicle, actually one remote-controlled vehicle, was invented
in one U.S. Air Force base in Ohio. In China, the National University of Defense
Technology took the lead in researching autonomous driving cars in the 1980s. With
a favorable policy of strengthening the nation through science and technology, the
car industry and technology are rapidly developing in China. Currently, Chinese
autonomous driving technology is considered among the best in the world.
Autonomous driving is a multidisciplinary (mechanical, engineering, computer,
and measurement) and multi-technological (perception, planning, positioning,
decision-making, high-definition map) integration system. Technological outcomes
from multidisciplinary contribute to autonomous driving cars. For example, mechan-
ical engineering provides the car’s hardware design, brake, and hydraulic system. Path
planning and intelligent decision-making are in the domain of computer science.
Precision mapping based on positioning and sensing is in the scope of computer
science and surveying. This chapter describes dynamic surveying techniques and
methods used in autonomous driving systems, including car positioning and naviga-
tion, object detection, and mapping. The application of autonomous driving cars in
different scenarios is also introduced.

© Science Press 2023, corrected publication 2024 165


Q. Li, Dynamic and Precise Engineering Surveying,
https://doi.org/10.1007/978-981-99-5942-6_3
166 3 Dynamic Surveying in Autonomous Driving

3.2 Car Positioning and Navigation

Car positioning refers to the determination of vehicle position, attitude and other
information. In an autonomous driving system, car positioning provides the basis
for other modules, such as path planning and high-definition mapping, guaranteeing
safe driving. Currently, positioning technology used in autonomous driving systems
mainly includes GNSS/INS integrated positioning, LiDAR and high-definition map
positioning, VO, and multi-sensor fusion. Multi-sensor fusion is suitable for complex
driving environments and is the most commonly used technology at this stage. This
chapter firstly introduces the former three techniques in detail and then demonstrates
real-world examples of multi-sensor fusion.

3.2.1 GNSS/INS Integrated Positioning

Positioning using GNSS is of high-definition but low update frequency. Positioning


using INS relies on the onboard inertial surveying unit and trace estimation. It has a
higher sampling rate, but there are accumulated errors during the trace. GNSS and
INS complement each other; thus, their combination makes up for the shortcomings
of the low update frequency of GNSS and the error accumulation of INS. GNSS and
INS integration are more commonly used in autonomous driving. In the following
sections, GNSS positioning and INS positioning are summarized. Finally, GNSS/
INS combination positioning will be introduced in detail.
1. GNSS positioning
GNSS is the collective name of all navigation satellite systems in the world. Currently,
four dominant GNSS systems are American GPS, Russian GLONASS, European
GALILEO, and Chinese BDS, as shown in Fig. 3.1.
With at least four satellites visible to the receiver, the receiver’s position can be
calculated using the four distances and the ephemeris of the satellites, as shown
in Fig. 3.2. According to the minimum number of receivers, positioning methods
can be classified into single point positioning (SSP) and differential positioning.
The former requires only one receiver, and pseudorange is commonly used. The
positioning precision is at a level of several tens of meters. The latter needs two or

Fig. 3.1 Four satellite systems for positioning and navigation in the world
3.2 Car Positioning and Navigation 167

Fig. 3.2 Single point positioning using BDS

more receivers, and the relative positions between the receivers are used. Both the
pseudorange and the carrier phase can be used in this method.
1) Single point positioning
In the SSP mode, only one receiver is needed to be installed on the car (Fig. 3.3),
whose position, denoted as (x, y, z), consists of three unknown parameters. In theory,
the car’s position can be determined with three satellites in view. However, the
time difference between the receiver and the satellites ∆t, accounts for one addi-
tional unknown parameter. Therefore, four satellites should be visible to the receiver
to calculate the receiver’s position and time difference as expressed in Eq. (3.1).
Affected by satellite orbit error, the relativity theory effect, and signal propagation
error, the positioning accuracy is not high enough for autonomous driving cars.
⎧ √

⎪ d1 = √(x − x1 )2 + (y − y1 )2 + (z − z 1 )2 + c∆t

d2 = √(x − x2 )2 + (y − y2 )2 + (z − z 2 )2 + c∆t
(3.1)

⎪ d = √(x − x3 )2 + (y − y3 )2 + (z − z 3 )2 + c∆t
⎩ 3
d4 = (x − x4 )2 + (y − y4 )2 + (z − z 4 )2 + c∆t

where d 1 , d 2 , d 3 and d 4 are distance from the car to the satellites, whose positions
are (x i , yi , zi ) (i = 1, 2, 3, 4), c is the speed of the light.
2) Differential positioning
Differential positioning is another positioning mode that improves positioning accu-
racy by introducing base stations, as shown in Fig. 3.4 The positions within the
differential positioning network can be corrected with the known positions of base
stations, which are usually of high accuracy. According to the target parameters
corrected by differential technology, differential positioning can be divided into posi-
tion difference, pseudorange difference, and carrier phase difference. In the former
two ways, the base stations send the position difference or pseudorange difference
168 3 Dynamic Surveying in Autonomous Driving

Fig. 3.3 GNSS receivers onboard one autonomous driving car

Fig. 3.4 The RTK principle in autonomous driving

between the estimated positions using the satellites and the known positions to the car
receiver. In these ways, the base stations are required to be near the car, for example,
within 100 km, which is hardly met in real scenarios. The latter using the carrier
phase is termed real-time kinematic (RTK). It improves the positioning accuracy to
a centimeter level, which is more suitable for autonomous driving. The base station
receives the carrier phase observation in this mode and calculates the carrier phase
correction using its known position. It can either send the original carrier phase obser-
vation or the correction to the car receiver, which can then correct the positioning
result of the car.
There are two modes of RTK, namely, conventional RTK and network RTK. The
workflow of conventional RTK is similar to that described above, except that the
base station broadcasts its carrier phase observation to all receivers within its range.
Then, in-vehicle processing modules obtain their coordinates through differential
positioning technology. However, this method still requires that the distance between
3.2 Car Positioning and Navigation 169

the base station and the uncrewed vehicle is not too far, and generally, the positioning
accuracy is up to a sub-meter level. Network RTK, on the other hand, establishes
a network with multiple reference stations in a certain area, and thus, vehicles in
this larger area can receive information to correct their positioning error in real time.
Network RTK achieves a broader coverage and higher positioning accuracy up to
the centimeter level.
2. INS positioning
The INS system measures the platform’s accelerated speed and angular speed in
inertial coordinates, which can be used to derive the speed and drift angle. depend
on external information, works for a long time and is an all-day mode with a high
data update rate and a high accuracy in the short term.
In the INS system, the core component is the IMU, whose appearance is shown in
Fig. 3.5. This unit consists of a three-axial gyroscope and three-axial accelerometer
and is usually installed at the car’s base. Some high-performance IMU is installed
with a three-axial magnetometer for measuring the angular separation to correct the
coordinates. The principle of INS positioning is also very simple. Given the coordi-
nates of the last moment, the coordinates of the current moment can be calculated
by combining the angular velocity and acceleration given by the IMU.
In practice, the gyroscope and accelerometer in IMUs will produce errors due
to various unavoidable interference factors. Since errors accumulate over time, the
IMU needs to be recalibrated at regular intervals to avoid excessive errors. GNSS is
usually used to rectify the positioning results of INS.
3. GNSS/INS integrated positioning used in autonomous driving
Both GNSS and INS localization have their disadvantages [1]. For example, GNSS
requires that the vehicle is always within the range of the receiving base station, and
the vehicle signal environment must be good. Due to the error accumulation problem

Fig. 3.5 The core


component of INS
170 3 Dynamic Surveying in Autonomous Driving

of INS, the positioning will become increasingly inaccurate. Neither of these two
methods can be used alone to address the complex road scenes in autonomous driving,
so it is necessary to combine them to provide a robust and accurate vehicle positioning
method.
Coupling GNSS and INS can improve the robustness and precision of car posi-
tioning. GNSS positioning can rectify the INS positioning results to avoid error
accumulation. On the other hand, The frequency of INS can reach up to 1 kHz,
which is much higher than that of GNSS, making up for the low update frequency
of GNSS. It is also a substitution for GNSS when the satellite signal is unavailable
in the view. The GNSS/INS integrated positioning is illustrated in Fig. 3.6.
GNSS/INS can be integrated into three modes, i.e., loosely coupled (LC), tightly
coupled (TC), and ultra-tightly coupled (UC). The tightly coupled method is the
most commonly used in autonomous driving, which will be introduced after briefly
introducing the LC method. The UC mode is an emerging coupled mode, the globally
optimized solution, and is just briefly introduced in this chapter.
Figure 3.7 shows the loosely coupled GNSS/INS integration used in autonomous
driving. In this mode, the GNSS and INS modules are independent to each other.
The former outputs the position and velocity, while the latter outputs the position,
velocity and attitude. These two outputs are then fused using Kalman filtering, after
which the final car position can be achieved. The difference between the GNSS and
INS positions and velocities are returned to the INS to rectify its positioning result.
In the loosely coupled mode, fusion is conducted with the final outputs from
GNSS and INS, which requires that both GNSS and INS are working normally.
The malfunction of either one can lead to failure of the whole system. The tightly
coupled mode is proposed to address this problem, as shown in Fig. 3.8. It fuses the
pseudorange and pseudorange rate from GNSS and INS modules to be fused in a
Kalman filter, which then outputs the final positioning result. The filtered result is
also used as a bias correction to the INS module.
Using pseudorange and pseudorange rate rather than the positioning results is
time-efficient. In the tightly coupled mode, the pseudorange and pseudorange rate
are independent observations, which is better for obtaining an accurate positioning
result. It is worth noting that this mode can work with fewer than four visible satellites

Fig. 3.6 GNSS/INS integrated positioning


3.2 Car Positioning and Navigation 171

Fig. 3.7 Loosely coupled GNSS/INS in autonomous driving

Fig. 3.8 Tightly coupled GNSS/INS in autonomous driving

since the GNSS module does not need to output the car position independently. There-
fore, the tightly coupled GNSS/INS has an anti-interference capability, improving
positioning accuracy.
The state equation of the tightly coupled GNSS/INS is expressed as Eq. (3.2).

X t = Ft X t + Gt W t (3.2)

where X t is the state vector, whose elements are errors of INS variables as described
in Eq. (3.3), Gt is the noise matrix, W t is a vector for systematic noise error, and Ft
denotes the systematic state matrix.
[ ]
X t = δr δv φ δωib
b
δf b (3.3)

where φ denotes attitude errors,δv the speed error, δr position error,δ f b derivative of
accelerated speed, and δωibb
derivatives of the gyroscope.
Substitute Eq. (3.3) into Eq. (3.2) and convert it to the matrix format as Eq. (3.4).
[ ] [ ][ ] [ ][ ]
Ẋ I FI 0 XI GI 0 WI
= + (3.4)
Ẋ G 0 FG XG 0 GG W G
172 3 Dynamic Surveying in Autonomous Driving

where the subscript I denotes INS, and the subscript G means GNSS.
Similarly, the observation equation is a combination of pseudorange and pseudo-
range rate observation equations, as shown in Eq. (3.5).
[ ] [ ]
Zρ Hρ
= Xt + V (3.5)
Z ρ̇ H ρ̇

where Z ρ is the pseudorange observation, Z ρ̇ is the pseudorange rate observation,


H ρ is the pseudorange state observation matrix, H ρ̇ is the pseudorange rate state
observation matrix, and V is the noise.

3.2.2 In-Vehicle LiDAR Positioning

GNSS/INS integration is the most widely used and mature positioning technology
in autonomous driving in most road scenarios. However, this positioning system can
be disturbed in areas of complex terrain or with weak satellite signals. This section
introduces in-vehicle LiDAR positioning matching to a high-definition map for car
positioning. There are two methods of LiDAR-based car positioning and navigation.
One is the LiDAR matching high-definition map, and the other is LiDAR odometer
positioning. We will briefly introduce the LiDAR system and then the two methods.
1. Introduction of LiDAR
LiDAR is a typical sensor used in autonomous driving, especially in car positioning,
high-definition mapping, and environment sensing [2]. The appearance and compo-
nents of the LiDAR are demonstrated in Fig. 3.9. One LiDAR usually consists of a
laser emitting module, a laser receiving module, and a signal processing module. The
prime components are the light source (or laser emitter), receiver, and prism. Param-
eters to evaluate the LiDAR performance include measuring distance, precision,
angular resolution, and point per second. The angular resolution includes horizontal
resolution and vertical resolution. Horizontally, LiDAR is rotated using an internal
engine. The horizontal angular resolution can reach 0.01°, and the horizontal viewing
angle can be up to 360°. The vertical angular resolution is mainly determined by the
array size and arrangement of the laser emitting elements, ranging from 0.1° to 1°.
There are many types of LiDAR under development, and some are already on the
market. According to the different mechanical structures, LiDARs can be classified
as mechanical and solid-state types. The former adjusts the laser emission angle by a
mechanical rotating structure. In contrast, the latter relies on electronic components
to adjust the angle. The latter is more compact than the former. According to the
number of arranged beams, LiDAR can be divided into single-beam and multi-beam
beams capable of acquiring 2D and 3D information, respectively. Commercial multi-
beam LiDAR mainly includes 4, 8, 16, 32, 64, and more beams. As the beam number
increases, the device price increases as well [3].
3.2 Car Positioning and Navigation 173

Fig. 3.9 Pictures and illustration showing one LiDAR onboard an autonomous driving car

LiDAR’s primary function is to construct a point cloud map of the environment,


which can be matched to a high-definition map, providing the basis for intelligent
driving. LiDAR receives the reflected laser from surrounding objects and measures
their distance from the car using the traveling time of the laser waves. Then, the point
cloud of the environment can be constructed. Generally, finer point clouds can be
accomplished with more beams of LiDAR. Therefore, multi-beam solid-state LiDAR
is widely used in autonomous driving due to its compactness and ability to acquire
3D point clouds.
2. LiDAR point cloud matching HD map for localization
The high-definition (HD) map is a necessary part of autonomous driving. Compared
to a traditional navigation map, it is specific to autonomous driving, and its rela-
tive accuracy is at the centimeter level. The high-definition map contains one static
and one dynamic map layer, and the former is the base map that can be matched
with the point cloud collected by LiDAR. After matching, the car’s position on the
map is determined. Unlike GNSS/INS integrated positioning, high-definition map
matching obtains the relative car position. In practice, GNSS/INS integrated posi-
tioning is used to obtain an absolute car position in the earth-based coordinates, with
which the environmental features can be extracted from the high-definition map.
Then, the LiDAR point cloud can be matched to the environmental elements of the
high-definition map. Therefore, by integrating GNSS/INS, high-definition maps and
LiDAR observations, both absolute and relative positioning can be accomplished.
Precise positioning involves obtaining the vehicles initial position using GNSS/
INS retieving environmental features from a HD map, and matching LiDAR
scanned point cloud map with environmental features using an algorithm. The key
step is finding the area in the HD map that matches the environmental features.
Frequently used matching algorithms are the iterative closest point (ICP) and normal
distributions transform (NDT), which will be introduced as follows (Fig. 3.10).
174 3 Dynamic Surveying in Autonomous Driving

Fig. 3.10 Matching LiDAR point clouds with the GNSS/INSS positions and the high-definition
navigation map

1) ICP matching algorithm


ICP [4] is a classic point cloud matching algorithm for fine matching between point
clouds. It constructs a relationship between each point pair in two-point clouds and
finds an optimal rigid transformation that minimizes the global distance between
them. Let us say that we have a LiDAR point cloud Q, and the ith point in the point
cloud is q i . The point cloud extracted from the high-definition map is P, within which
each point is denoted as pi . . The ICP algorithm aims to find the nearest neighbor
point pair (q i , pi ) and calculate a rigid and translation transformation R, t, which
minimizes the error function as expressed in Eq. (3.6). R and t represent 3 × 3 matrix
and 3 × 1 vector, respectively.

1 ∑ || ( )||
n
E(R, t) = ||q i − R pi + t ||2 (3.6)
n i=1

The ICP algorithm is mainly divided into two steps. One is to find the closet point
pair (q i , pi ); the second step calculates the optimal rigid transformation parameters,
R and t.
The closest neighboring points are defined using Eq. (3.7), where the distance
d(qi , pi ) is calculated as the Euclidean distance.
( )
∀q i ∈ Q, pi ∈ P, i = argmin d q i , p j (3.7)
j

The optimal rigid transform parameters, R and t, are usually solved using SVD.
First, calculate the centroids of the two clusters of point clouds, and then calculate
the error vector from the corresponding points in the two clusters of point clouds to
the centroids of their respective point sets, as shown in Eq. (3.8).

p't = pt − u P
(3.8)
q 't = q t − u Q
3.2 Car Positioning and Navigation 175

where u P and u Q are the mass centers of the target and source point clouds,
respectively.
Therefore, the cross-correlation matrix W of the two-point cloud clusters can be
yielded using Eq. (3.9).

1 ∑ [( )( )T ] 1 ∑
n n
W= pi − u P q i − u Q p' q T (3.9)
n i=1 n i=1 i i

The singular value decomposition of the cross-correlation matrix W yields.


Then, singular value decomposition is applied to the cross-correlation matrix W
as in Eq. (3.10).

W = U∆V T (3.10)

The rigid transformation parameters, R and t, representing the relationship


between the source and target point clouds, can be calculated using Eq. (3.11).

R = UVT
(3.11)
t = u Q − Ru P

The steps of the ICP algorithm are listed as follows.


For a point pi in the target point cloud P, its correspondent q i (is searched) in the
source point (cloud Q.
) These two points can form one point pair q i , p i , with the
minimized d q i , pi = min, calculated by using Eq. (3.7).
The( rigid)transformation parameters, R and t, are calculated by using the point
pairs q i , pi according to Eq. (3.11).
The error function E( R, t) is calculated by using R and t obtained in step 2, as
expressed in Eq. (3.8).
If E(R, t) is below a given threshold, then the algorithm stops. In contrast, steps
1–3 are iterated until E(R, t) is below the given threshold.2
2) Normal distributions transform
The NDT algorithm calculates the point cloud by first voxelization of the point
cloud and dividing each point in the cloud into voxelized squares. The insight of the
algorithm is to calculate the probability density function (PDF) within each small
square employing a normal distribution, and if the target point cloud matches the
source cloud to a high degree, then the calculated probability density is larger, and
vice versa, the smaller.
First, for the voxelized point cloud, calculate the multidimensional normal distri-
bution parameters, such as the mean value and covariance matrix, of each voxel. Then,
the transformation parameters are applied to the source point cloud, transforming it
to match the voxel cube of the target point cloud. The transformation parameters
can be set to zero at the initial transform. The probability density function for each
transformation point can be calculated using Eq. (3.12).
176 3 Dynamic Surveying in Autonomous Driving
[ ]
1 (x − μ)T C −1 (x − μ)
p(x) = exp − (3.12)
c 2

where p(x) is the probability that x follows a normal distribution with a mean value
of μ and a covariance matrix of C, c is a constant, x is the point to be transformed,
μ is the mean value, and C is the covariance matrix.
Immediately afterward, the summed probability densities of all points are calcu-
lated to obtain the confidence of matching between the target point cloud and the
source point cloud, as expressed in Eq. (3.13).


n
score(T ) = p(x i ) (3.13)
i=1

Finally, the objective function, score(T ), is optimized using Newton optimization


algorithm, and the purpose is to find the transformation parameters that maximize the
objective function, i.e., the best alignment between the source and target point clouds.
The transformation parameters obtained after the optimization are used to transform
the source point cloud, and then the probability density is calculated, followed by
applying the Newton optimization algorithm. The steps mentioned above are repeated
until the accuracy requirement is met.
Compared with the ICP algorithm, the NDT algorithm does not calculate the
features of the corresponding points, and there is no feature matching process, which
reduces the computational cost and improves the computational speed compared to
the ICP algorithm. In addition, the NDT algorithm does not require an accurate initial
value for iteration, and it can find the optimal match even if the initials are inaccurate.
3. LiDAR odometry
In autonomous driving, LiDAR odometry can also be used for vehicle positioning.
The relative motion of the platform can be obtained by calculating a transformation
matrix using two matched LiDAR point clouds from each of two adjacent frames. It is
relative positioning. Recently, proposed LiDAR odometry methods are reported to be
more accurate, but the core algorithms are similar compared to the abovementioned
flow, except that different point cloud features are used in the matching.

3.2.3 In-Vehicle Visual Odometry

In the dynamic measurement for autonomous driving, visual measurement sensors


are used in sensing, mapping, and positioning. Visual odometry (VO), also known as
the visual front-end of the visual simultaneolls localization and mapping (SLAM), is
introduced in this section to illustrate its function in car positioning. It estimates the
camera’s movement and restores the scene using photos collected by one or more
cameras, providing an excellent initial estimation for the backend [5]. The positioning
3.2 Car Positioning and Navigation 177

of the platform is accomplished by using neighboring frames, which accounts for a


relative positioning method.
1. Category of VO used in autonomous driving
One significant advantage of VO is that it enables vehicle positioning using only one
camera without any prior information. According to the number of onboard cameras,
VO can be classified into monocular and stereo.
Monocular visual odometry uses only a single camera for state estimation, so the
device is of low complexity. However, it is also due to using a single camera that no
absolute scale can be defined, and the scale ambiguity can lead to large positioning
errors. In contrast, stereo VO can restore the scale of the scene. The depths of the
feature points can be calculated using the relative positions of multiple cameras to
finely estimate the platform trajectory. However, multiple cameras increase the cost
of autonomous driving cars.
1) Monocular VO
Many scholars have focused on monocular VO because of its simple device structure
and low cost. Since there is a lack of depth information, the camera pose transforma-
tion between two frames is usually set to a fixed value, which is a rough estimation.
In practice, feature points are usually selected from each of the two adjacent frames,
and then feature point matching is performed to calculate the relative motion of
the camera. Since the data are 2D, the problem is generally solved using epipolar
geometry (EG).
At this stage, the common monocular visual odometers are parallel tracking and
mapping (PTAM), semi-direct visual odometry (SVO), and direct sparse odometry
(DSO). PTAM is the first visual odometer that uses nonlinear optimization and is
the first SLAM algorithm that proposes to separate tracking and mapping into two
phases. SVO is a semi-direct monocular visual odometry. Semi-direct means that the
camera motion is estimated by matching image blocks with feature points without
matching the whole image, thus reducing the computational cost, and it is a method
between matching methods using feature points and the whole images. DSO is a
sparse direct SLAM algorithm without loopback detection and map reuse, but its
matching speed is greatly improved compared with the traditional method. In addition
to the previous algorithms, some novel methods are emerging in studies about monoc-
ular visual odometry, such as those based on deep learning and semantic combination.
For example, some researchers have proposed a monocular VO SLAM method that
correlates depth and semantic information. It is suitable for large-scale 3D recon-
struction and is used for joint training and prediction by multitask convolutional
neural networks.
2) Stereo VO
Stereo VO avoids the complicated and tedious camera pose solution in monocular
visual odometry. It estimates the camera motion more accurately using the depth
information obtained with multiple cameras. Binocular cameras are used in most
cases of stereo VO, and the depth information of the scene is calculated with a known
178 3 Dynamic Surveying in Autonomous Driving

baseline length between the two cameras. A binocular camera can be regarded as two
monocular cameras with a known baseline length. However, when the scene depth
is much larger than the baseline length, the scene of the two cameras is almost the
same. In this case, the binocular camera equals a monocular camera. In this sense,
the binocular camera usually only estimates the depth information of objects at a
relatively close distance.
With the ability to obtain depth information, stereo VO can acquire 3D information
of feature points, which can be used to calculate the transformation matrix between
two adjacent frames using ICP or NDP algorithms introduced in the previous subsec-
tion. Stereo VO is superior to monocular VO in resolving the camera pose and esti-
mating its moving states and angles. However, autonomous driving cars are usually
equipped with multiple sensors, and using monocular cameras in combination with
other sensors can achieve the same effect as stereoscopic odometers with less equip-
ment complexity and cost, thus reducing the cost of driverless cars. Multi-sensor
fusion positioning is presented in Sect. 3.2.4.
2. Matching algorithms used in in-vehicle VO
Camera motion can be retrieved by frame matching based on either feature points or
pixel brightness when using in-vehicle VO. The method based on pixel brightness
is more direct and time-efficient, as there is no need to extract and match the feature
points. However, it is less used than the method based on feature points, which is
currently the mainstream.
1) Feature point method
The main steps of visual matching using the feature point are feature extraction,
feature matching, and state estimation. One critical step is extracting feature points
consisting of key-point positioning and calculating descriptors, which are indexes
calculated using the pixels around the key point, which can distinguish different key
points and thus be used for the subsequent feature matching. In addition, feature
matching can be classified into three matching modes, i.e., 2D-2D, 3D-2D, and 3D-
3D, according to the different dimensions of feature points. These three cases are
solved using different methods, as shown in Fig. 3.11.
One criterion for evaluating the feature extraction algorithm is that the extracted
features should be able to describe the key point invariantly under camera motion
and lighting changes. Classic image feature point algorithms include scale-invariant

Fig. 3.11 The framework of feature extraction


3.2 Car Positioning and Navigation 179

Fig. 3.12 Pyramid of


images

feature transform (SIFT), speeded up robust features (SURF), oriented FAST and
rotated BRIEF (ORB), among which SIFT can accurately calculate feature points but
is of computational cost. The ORB balances performance and efficiency. Therefore,
ORB is a good choice for autonomous driving, which requires real-time solutions.
Here, we introduce the ORB algorithm in detail.
The feature point extracted by the ORB algorithm is oriented FAST, a modified
FAST corner point with scale invariance and rotation description. The scale invari-
ance is achieved by constructing an image pyramid (Fig. 3.12), and the rotational
description of the features is realized by the grayscale center-of-mass method.
The procedure for calculating the FAST corner point is given as follows. We select
16 pixels around it for each pixel I c , in the image with a circle of a three-pixel radius.
If there are n consecutive pixels among these 16 pixels whose brightness is not in
the range of Ic ± T , then I c can be considered as a feature point.
After obtaining the feature points according to the principle of FAST, the detected
image is scaled several times with a fixed multiplying factor to obtain an image
pyramid, as shown in Fig. 3.12, and the scale invariance is achieved by detecting the
key points on each layer of the image pyramid.
The feature orientation used in the ORB algorithm is calculated by using the
grayscale center-of-mass method, in which an image block is formed with pixels
around the feature pixel, and its orientation is defined as the direction of the grayscale
center-of-mass to the geometric center.
The formula for calculating the gray mass center of the image block is shown in
Eq. (3.14).
( )
m 10 m 01
C= , (3.14)
m 00 m 00
180 3 Dynamic Surveying in Autonomous Driving

where m 10 , m 01 , and m 00 are three moments of one image block, r is one half of
the side length of the block inner tangent square. The moment is calculated using
Eq. (3.15).


r ∑
r
m pq = x p y q I (x, y), ( p, q) = {(0, 0), (0, 1), (1, 0)} (3.15)
x=−r y=−r

Connecting the geometric center with the mass center of the image block can
obtain a direction vector, which can be assigned as the direction of the feature point.
This direction can be calculated as Eq. (3.19).
m 01
θ = arctan (3.16)
m 10

Feature matching is conducted after feature extraction. Different solving methods


will be used for feature point pair information of different dimensions. The solutions
are presented in the following for the three cases mentioned in the previous section.
(1) 2D-2D. Only the 2D information, the pixel coordinate of the feature point pair,
is known. Camera motion can be restored using the pairwise Epipolar Geometry
method. In this case, at least eight feature point pairs are needed.
(2) 3D-2D. For each feature point pair, if the 3D coordinates of one point are known
and the pixel coordinates of the other are known, then the PnP algorithm can be
used to restore the camera motion, and at least three-point pairs are needed to
obtain a valid solution.
(3) 3D-3D. If 3D coordinates are known for both points in the feature point pair, the
ICP algorithm can be used, and its principle has been introduced in the previous
subsection.
Finally, after feature matching, a transformation matrix containing the parameters
describing the camera motion between each of two adjacent frames is obtained. Then,
relative positioning is achieved.
2) Direct method
The feature point method requires considerable computational resources. It cannot
output reliable feature points robustly, especially when the image is featureless. Some
researchers have proposed a method that does not require feature points to solve this
problem and is termed the direct method. The direct method relies on pixel brightness
to estimate the camera motion without extracting feature points.
The plane points of spatial point P on the two frames are p1 and p2 , respectively.
Different from the feature point method, no point pair is calculated in the direct
method. The insight of the direct method is assuming the same lightness of the same
point in two frames. Then, the current camera pose ξ is used to search the coordinate
of p2 in the next frame to its correspondent p1 in the current frame. The aim is to
find the closest p2 to p1 , and then the camera pose is updated with the coordinates
of p1 and p2 . If the camera pose error is relatively large, the pixel p1 p2 brightness
3.2 Car Positioning and Navigation 181

information will have a significant difference, so ξ can be optimized by repeatedly


reducing the error.
The direct method can be described as an optimization problem that can be formed
as Eq. (3.17).


n
min J (ξ ) = ei ei (3.17)
T
i=1

where ξ denotes the transformation of the camera pose between two frames, n is the
point number, and ei denotes the photometric error of the i th point pi , which can
be calculated using Eq. (3.18).
( ) ( )
ei = I1 p1 , i − I2 p2 , i (3.18)
( ) ( )
where I1 p1 , i and I2 p2 , i are the image coordinates of the ith point in frames
1 and 2, respectively. The assumption here is that the lightness of the point does
not change in the two frames, which may not be true in reality, as the illumination
on the same point can change between two frames, resulting in a lightness change.
Therefore, this method is affected by illumination changes between adjacent frames.
As shown in Eq. (3.17), the unknown parameter to be optimized is the camera pose
ξ , which can be obtained by calculating the Jacob matrix of the optimization equation
and calculating the increment using the Gauss–Newton method or the Levenberg–
Marquardt method.

3.2.4 Multi-sensor Fusion Positioning

The previous sections introduced various solutions for car positioning. However,
they all have deficiencies, which cannot meet the demand for robust positioning
in different environments. For instance, the GNSS signal is weak in places such as
tunnels, which are sometimes even blind areas with no signal. One intrinsic deficiency
of INS is the error accumulation during the trajectory. LiDAR requires considerable
power support, and it costs much more than other solutions. Visual odometry is
affected by light variation and shadows.
No one solution is perfect in practical applications. Due to the complex and various
road conditions, depending on one sensor to achieve accurate and robust positioning
is not realistic. Therefore, combining multiple complementary sensors can provide a
reliable solution for accurate positioning. GNSS/INS integrated positioning is a basic
example of multi-sensor combination methods. This section introduces the scheme
of a fusion positioning system with more sensors.
Figure 3.13 shows the available sensors installed on autonomous driving cars.
With advanced technology, the multi-sensor fusion approach is applied not only to
182 3 Dynamic Surveying in Autonomous Driving

Fig. 3.13 Sensors onboard the autonomous driving car

navigation and positioning but also to environmental perception and high-definition


map construction.

1. Processing flow of multi-sensor positioning

In-vehicle positioning for autonomous driving is realized by integrating datasets


from multiple sensors, such as GNSS, INS, LiDAR, and cameras. The fusion module
applies pre-processing, alignment, and state estimation to multi-source data collected
by multiple sensors and fuses them to output the car position and attitude. The
processing flow of the fusion module is shown in Fig. 3.14.
Data preprocessing includes sensor initialization and calibration. Sensor calibra-
tion determines fixed parameters such as the intrinsic parameters of one camera and
the scanning angle of a LiDAR system. Data registration contains both temporal and
spatial alignments. Temporal alignment aligns the data obtained by each sensor in
time or synchronizes the data information at the same time. Spatial alignment covers
the data from different sensors to the same coordinate system and then estimates and
compensates for the sensor bias by comparing the different estimations of the same
target. State estimation fuses all sensor data to estimate the position and attitude

Fig. 3.14 The flowchart shows the integration of multiple sensors for positioning
3.2 Car Positioning and Navigation 183

of the vehicle and is generally performed using Kalman filter algorithms, extended
Kalman filter algorithms, neural networks, and other methods.
2. Fusion positioning algorithms
The observation of the same environment by multiple sensors generates multiple data.
Since the parameters and formats of the data are entirely different, multi-sensor data
fusion (MDF) algorithms are indispensable to make these data from different sensors
available and usable. Multi-sensor data fusion algorithms are similar to a human brain
or a computer chip that integrates data from various sensors to obtain a consistent
interpretation or description of the measured object, thus providing a more accurate
description of the observed environment. In driverless dynamic measurement vehicle
positioning, the purpose of multi-sensor fusion algorithms is to synthesize individual
sensor data to obtain more accurate vehicle location information and to better cope
with driverless vehicle positioning in various complex environments.
Currently, available multi-sensor fusion algorithms include weighted average,
Bayesian estimation, Kalman filter, Dempster-Shafer (D-S) evidence inference,
statistical decision theory, fuzzy logic inference, and artificial neural network. The
weighted average is the most straightforward and low-level multi-sensor fusion algo-
rithm, in which data provided by multiple sensors are weighted and averaged to obtain
the fused result. Bayesian estimation is a form of a probability distribution that makes
inferences by synthesizing prior information and sample information into a posterior
release. The fusion using Bayesian estimation can be regarded as a high-level fusion.
The Kalman filter makes the optimal state estimate by recursion, and there are also
methods such as the extended Kalman filter and traceless Kalman filter according to
different systems. D-S evidence inference is an expansion of Bayesian estimation,
and it contains a basic probability assignment function, trust function, and likelihood
function. In recent years, as deep neural network algorithms have gained increasing
popularity, some scholars have proposed to predict the weights of multiple sensors
by neural networks to obtain the final fused result.
3. Multi-sensor fusion positioning solutions
At this stage, major autonomous driving manufacturers have adopted different strate-
gies for multi-sensor fusion positioning, and the most representative strategy is the
solution of Baidu Apollo [6]. On January 11, 2018, Baidu Apollo 2.0 was officially
released, and although more than three years have passed since then, it is still very
representative in the autonomous driving industry. We introduce the multi-sensor
fusion positioning solution of Baidu Apollo 2.0 as an example in the following.
The Baidu Apollo 2.0 positioning solution uses the Neousys Nuvo-6108GC
computing center, whose architecture is X86, capable of supporting Xeon E3 and
i7 processors and GTX1080 graphics cards. The CAN communication card uses the
ESD CAN-PCIe/402-B4 model, which interacts with the car’s hardware and sends
signals to control the car’s acceleration, braking, signal lights, and steering. GPS and
IMU use the finished product solution, including NovAtel SPAN-IGM-A1, NovAtel
SPAN® ProPak6, and NovAtel IMU-IGM-A1. Lidar uses Velodyne HDL-64E S3
with 64 lines, 360° horizontal angle and 26.9° vertical FOV, which can collect very
184 3 Dynamic Surveying in Autonomous Driving

Fig. 3.15 The flowchart shows the integration of multiple sensors for positioning in the case of
Baidu Apollo2.0

dense point cloud data. In addition, the autonomous driving car is also equipped with
cameras and millimeter-wave radar to detect obstacles and vehicles around the car
and compute the accurate distance to the car. The multi-sensor fusion algorithm used
in the solution is the extended Kalman filter algorithm.
As shown in Fig. 3.15, Baidu Apollo 2.0 positioning can be divided into three
submodules: navigation solution operator module based on IMU, differential GNSS
positioning module, and point cloud matching positioning based on LiDAR and a
HD map. The data of these three submodules are fused by an extended Kalman filter
to output the absolute location information of the vehicle, and these modules can
be corrected by the location information to improve the positioning accuracy of the
whole positioning system. The three modules of the Baidu Apollo2.0 positioning
system are fused in a loosely coupled way, and only location, speed, altitude and
other information are used when fusion.

3.3 Object Detection in Autonomous Driving

Object detection refers to finding all objects of interest or regions of interest (ROI)
in the image and determining their categories and location. It is one of the crit-
ical technologies for autonomous driving. According to different applications and
requirements, object detection can be divided into 2D and 3D tasks. In autonomous
driving, 2D object detection supports many tasks, such as attribute discrimination
in road scenes, tracking, and autonomous parking. 3D object detection is devel-
oped from 2D object detection. This section first introduces 2D object detection,
followed by vision and point cloud-based 3D object detection, and finally introduces
3D dynamic object detection based on fusion strategies.
3.3 Object Detection in Autonomous Driving 185

3.3.1 2D Object Detection

The detection categories for autonomous driving in the KITTI [7] dataset include cars,
vans, trucks, trams, pedestrians, persons (sitting), cyclists, and misc (hybrid vehicle
attributes). The object detection task is to output the category and the position (2D box
with region of xmin , xmax , ymin , ymax ) of each object. According to the development
of the target detection task, object detection can be divided into two-stage, one-stage
2D detection, and anchor-free 2D detection.
1. Two-stage 2D object detection
R-CNN [8] is the first deep learning method used for object detection in images.
Since the detection task requires additional localization of target regions compared
to the classification task in images, the method first extracts candidate 2D bounding
boxes, i.e., a fixed number of subregions are given in advance, and then the size
of each candidate 2D bounding box is changed to a uniform scale and fed into
a neural network for semantic information learning, where AlexNet is introduced
as the feature extraction backbone, and then the support vector machine (SVM) is
utilized to recognize categories. In addition, there is a position correction branch,
which adjusts the initial position of the candidate 2D bounding box to make the
final 2D box more accurate. The whole process of training is more complex than the
classification-only structure. First, the backbone and the classifier are trained in a
large dataset for classification tasks such as ImageNet, and then the position branch
is trained on the dataset with 2D boxes, which is the typical training method for
two-stage algorithms. Figure 3.16 shows the architecture of R-CNN, and Fig. 3.17
shows the general process of 2D target detection on the KITTI dataset. The reader
can clearly observe the two parallel branches following the feature extractor of the
network to accomplish the classification and position tasks.
The R-CNN is epoch-making, but it is time-consuming as the convolution opera-
tion is applied to all candidate 2D bounding boxes, and thus it is hard to deploy in prac-
tical applications. To tackle this problem, a fast variant of the R-CNN, Fast-RCNN,
was proposed [9]. To improve the inference speed of 2D object detection, Fast-RCNN
replaces the forward propagation convolution, which is performed sequentially for
all candidate 2D bounding boxes, with a single full-image feature extraction process.
The network then outputs the categories and the regressed position offset values of the
bounding boxes so that the tasks are no longer trained separately. However, massive

Fig. 3.16 The framework of RCNN


186 3 Dynamic Surveying in Autonomous Driving

Fig. 3.17 The framework of 2D object detection

extraction of candidate regions is still needed in training, and the object detection
structure does not achieve real-time performance.
Faster-RCNN [10] is a further improvement of Fast-RCNN, proposing a region
proposal network (RPN) to generate a large number of candidate 2D bounding boxes.
This operation further improves the detection speed of the network. Figure 3.18 shows
the Faster-RCNN framework.
A two-stage 2D object detection model based on road scenes can be obtained
by replacing the dataset with other autonomous driving datasets (such as KITTI,
NuScene [11], Weymo [12], and ApolloScape [13]). To compensate for the difference
in quantity between the public datasets for autonomous driving and image recogni-
tion, researchers perform data augmentation based on long-tail distribution and intro-
duce some other learning strategies, which can focus on the low-resolution targets
(distant targets in images), highly occluded targets, and blurred images acquired on
rainy and foggy days.
2. One-stage 2D object detection
Abandoning the stage of selecting bounding box candidates, researchers found that an
anchor-based method can be accomplished faster and more effortlessly. You only look
once (YOLO) [14] is a typical one-stage anchor-based algorithm. The image is input

Fig. 3.18 The framework of Faster-RCNN


3.3 Object Detection in Autonomous Driving 187

Table 3.1 Object detection


Frameworks Mean average precision/% Time/(frames/s)
based on different algorithms
R-CNN 66 0.02
Fast-RCNN 70 0.4
Faster-RCNN 73 7
YOLO 77 21
SSD 66 46

to the YOLO network, in which the feature graph is divided into several regions. The
attributes of the target whose center fails in this region are learned within each region.
These divided regions are similar to anchors. This network outputs the category with
confidence and the bounding box of the detected target. The proposition of the YOLO
network triggers research on one-stage object detection.
YOLOv2 [15] and YOLOv3 [16] are both upgraded versions of YOLO. In these
improved versions, the fully connected layer is replaced with convolution-pooling
layers. In addition, a new backbone and multi-scale information are added. These
improvements aim to improve the detection accuracy. High detection accuracy is
easier to achieve using two-stage detection, while one-stage detection performs faster.
The single shot multibox detector (SSD) [17] network gains popularity similar
to that of the YOLO network. The SSD network contains multiscale features and
multiple anchor sizes. SSD outperforms YOLO in detecting small targets, and many
subsequent 2D detection networks use it as a template. Regardless of whether YOLO
or SSD networks are used, one-stage 2D object detection performs well in terms of
both speed and accuracy. Currently, research is dedicated to achieving lightweighted,
distributed, and embeddable networks. Table 3.1 shows the time accuracy comparison
of the accuracy and time consumption of the above networks in the application of
2D object detection.
In autonomous driving, one-stage detection is more commonly used than two-
stage detection due to its fast speed and adaptability. On the one hand, one-stage
detection is inherently faster than two-stage detection, which satisfies the need for
balancing speed and accuracy in industrial applications. On the other hand, one-stage
detection is easier to deploy and migrate. For example, 2D object detection can be
adapted to 3D object detection without changing the one-stage network (SS3D [18]).
3. Anchor-free 2D object detection
The anchor-free method was first proposed in 2018, and this method was published
in CornerNet [19] and CenterNet [20]. It essentially accounts for one of the anchor-
based methods. It is similar to the one-stage approach, except that the anchor-free
method treats the anchor as a 1 × 1 box and regards each pixel as one anchor.
The anchor-free method removes the nonmaximum suppression (NMS) operation to
reduce the inference time.
188 3 Dynamic Surveying in Autonomous Driving

Fig. 3.19 The basic structure of CenterNet

Visual 3D target object detection based on anchor-free strategies includes


CenterNet and SMOKE [21], in which 2D output is upscaled to 3D, and then the
positions of 2D boxes are obtained using the image projection mapping function
(Fig. 3.19).

3.3.2 Vision-Based 3D Object Detection

1. Monocular 3D object detection


Monocular 3D object detection refers to the process of acquiring 3D information
about all targets in one image. It was inspired by monocular visual 3D reconstruction,
and the earliest research on this was related to indoor autonomous localization and
mapping for intelligent robots.
Determining the properties of a target in 3D space requires nine parameters
(x, y, z, α, θ, γ , w, h, l) where (x, y, z) denotes the object’s 3D center in the refer-
ence coordinate system, (α, θ, γ ) indicates the pitch, yaw and roll angle of the object,
and (w, h, l) are the width, height, and length of the object. 3D object detection is
a problem with 9 degree of freedom (DoF). Generally, the reference coordinate can
be set as the camera, with the Z-axis pointing to the front direction of the camera’s
movement, the Y-axis pointing to the ground, and the X-axis pointing to the left,
forming the right-hand coordinate, as shown in Fig. 3.20. In some cases, one prior
knowledge is that there is no roll and pitch angle in rural areas, which are rotation
angles of the Z-axis and Y-axis. Therefore, the degree of freedom is reduced to seven.
In addition, objects are modeled prior to the detection in some cases. Thus, there is
no need to measure the object’s dimension, further reducing the degree of freedom
to four.
As deep learning methods are increasingly applied in 2D tasks, researchers have
found that deep nonlinear fitting strategies can also be deployed in 3D vision. Inspired
by the 2D anchor, increasing anchor scales in 3D monocular object detection is
proposed to capture multiscale information. For example, the core of 3D-RPN [22],
a 3D image perception model based on Faster-RCNN, replaces 2D candidate regions
with spatial 3D boxes. In other words, every object’s dimension, angles, and position
need to be regressed through its anchor.
Apart from the two-stage detection, one-stage and anchor-free detections are also
proposed and tested for 3D object detection. The 2D output is upscaled to a 3D output.
However, the performance of all these approaches is not satisfactory. Theoretically,
restoring the 3D information of objects using only one image is an ill-posed problem.
3.3 Object Detection in Autonomous Driving 189

Fig. 3.20 Camera coordinate used in the single-image 3D object detection

Scale ambiguity exists between a near small object and a far large object, as they
may appear the same in one image. Modeling the object before training or adding
projection constraints after training is proposed by researchers to resolve this scale
ambiguity and thus increase the feasibility of the monocular 3D detection strategy.
Apart from the scale, resolving the car’s heading is also a challenging task. Therefore,
this section introduces the scale constraint and heading angle regression used in
monocular vision.
In autonomous driving, the heading angle of the object can be divided into a global
heading angle and a relative heading angle. Figure 3.21 shows three consecutive
frames captured by the color camera in one scene of KITTI. With the annotated
label from external sensors such as LiDAR, the target in the red circle in the image
runs in an approximately straight line in the camera coordinate system. There is
no significant turning behavior, and its direction relative to the camera coordinate
system is referred to as the global heading angle. However, because the angle of the
target relative to the acquisition vehicle of the two cameras is constantly changing,
the morphological characteristics of the target vehicle in the image are not the same.
On the right side is a magnified view of the local characteristics of the target. The
reader can find that the representation of this target on the image has changed, so it
is necessary to combine the deviation angle of the target location with the camera
coordinate system composition on the global heading angle, and the combined angle
is called the local heading angle.
The relative heading angle, global heading angle and clip angle between the target
and the camera are shown in Fig. 3.22. The gray triangle is the camera model. The
blue rectangle is the moving target. The gray solid arrow represents the direction
of motion, and the horizontal gray solid line is the parallel line passing through
the center of the vehicle and the camera’s X-axis. The figure also contains the line
between the center of the target and the center of the camera, and the gray dotted
line is perpendicular to this line. In this figure, Ry is the global heading angle and θ
is the local heading angle.
190 3 Dynamic Surveying in Autonomous Driving

Fig. 3.21 Three consecutive frames of car detection in the KITTI dataset

Fig. 3.22 Geometric


relationship between the
global heading angle, Ry and
the relative heading angle

Measuring the dimension of moving targets, including their length, width, and
height, can be conducted in two ways. One way is to introduce the CAD target
models in advance, determining their 3D dimension, and the other approach is fitting
an average size deviation according to the target category. The latter is more conve-
nient to conduct than the former, which does not require an additional modeling
process. Constraints can be added to obtain accurate dimensional information, taking
3.3 Object Detection in Autonomous Driving 191

advantage of the projection mapping relationship between the 3D space and the 2D
image, such as constraints on the targets’ height in 3D space and the height of the
bounding box in the image.
Theoretically, the depth of an object cannot be obtained by a single image alone.
The scale can be restored in practical applications by applying strategies such as
introducing the depth map as supervision guidance, ground parallelism assumption,
and 2D-3D dimension constraints. Inspired by monocular depth estimation of indoor
robots, early researchers utilized additional branches for supervised depth estimation
to obtain depth maps. For each input pixel, this algorithm outputs a corresponding
depth value. However, unlike indoor applications, autonomous driving faces chal-
lenges in outdoor environments, including large background areas, non-distance
regions (such as the sky), and heavily blocked targets. In addition, the high-speed
motion of the vehicle can also affect depth estimation. Producing supervised depth
maps is tedious work. It should be noted that the depth maps obtained by monoc-
ular unsupervised methods are still relative depths rather than absolute depths. Some
researchers have proposed an instance depth-based regression strategy to address
inputs with a large invalid background. It still cannot obtain absolute depths from
only a single image without a labeled depth map, although it dramatically reduces
the computational cost. It has been reported that the strategy of producing a depth
map using supervised learning is less effective in the test dataset, and its migration
ability is weak.
The ground parallel assumption that the target and the car share the same ground is
an idealized model. The target’s depth can be retrieved by using the camera projection
mapping equation with the target’s position in the image. However, this assumption
is not held on rural roads and mountainous roads. Another limitation of this method
is that the depth error is related to the depth vanishing line (photocenter coordinate
line). The closer to the vanishing line, the greater the depth error.
Currently, various constraints are used to improve monocular depth accuracy. For
example, Deep3DBox constructs a 3D box-2D box constraint relationship between
the 3D bounding points of the object and the minimum bounding rectangle containing
the projected points of these 3D points in the image. Then, the 3D box localization of
the object can be restored using this relationship after obtaining the target dimension
and its heading angle. In addition to the abovementioned methods, strategies include
building target-target relationships and introducing target depth uncertainty.
The ultimate purpose of object detection is to obtain the object’s absolute position
(x, y, z) , which is calculated from the combination of the object’s image location
(u, v) and depth z. The conversion is expressed in Eqs. (3.19) and (3.20).

zu − cx z
x= (3.19)
fx
zv − c y z
y= (3.20)
fy
192 3 Dynamic Surveying in Autonomous Driving

Fig. 3.23 One result of the single-image 3D object detection

where u and v are the positions of the object center projected on the image, whose
coordinate system consists of the x and y axes, f x and f y are the focus lengths along
the x and y axes, and cx and c y are the offsets of the optical center concerning the
origin of the image coordinate system. Figure 3.23 shows the result of monocular
3D object detection. It can be seen that in truncated and blocked scenes, there is a
significant bias in the 3D detection adopting a single image.

2. Monocular video 3D object detection

Monocular video 3D object detection is applied on multiple frames rather than a


single image. It is based on monocular video depth estimation algorithms, especially
unsupervised video depth estimation, and needs only labels from keyframes or even
no labels rather than all the labels from frames. The same object is traced in all frames
and optimized in keyframes. It introduces optical flow, that is, the flow of pixels
through the frames. This approach optimizes observation in each frame utilizing the
changes of the same object in different frames. The optimization can be realized by
Kalman filtering, particle filtering, or the Gaussian mixture model.
The constraint equation is first established to learn the optical flow, which
considers a single pixel’s light intensity in the first frame, I (x, y, t), where t repre-
sents the time. During a period of dt, this pixel moves a distance of (dx, dy) at the
next frame. It is hypothesized that the light intensity of the pixel remains constant
before and after the movement. Then, the optical flow is calculated by Eqs. (3.21)
and (3.22). This model is applied to all pixels to obtain an optical flow map.

I (x, y, t) = I (x + dx, y + dy, t + dt) (3.21)

∂ I dx ∂ I dy ∂ I dt
+ + =0 (3.22)
∂ x dt ∂ y dt ∂t dt

Optical flow only considers the 2D movement of the target, so it cannot recover
the 3D movement. The unsupervised depth estimation is used to restore the depth,
and the equations are similar to those of the optical flow, with the hypothesis that
3.3 Object Detection in Autonomous Driving 193

the same point in 3D space has the same pixel intensity in two adjacent frames. The
depth estimation adopting video streaming is achieved by Eq. (3.23).
( )
ps ∼ K T̂ t→s D̂t pt K −1 pt (3.23)

where pt and ps are the normalized pixel values in the target and source frames,
respectively, K is the internal camera parameters, T̂ t→s is the predicted self-vehicle
pose transformation matrix from the target frame to the source frame, and D̂t is the
predicted depth map at the target frame. The static target assumption that the same
object has the same features embodied in adjacent frames can be used as a constraint,
and the loss function is expressed in Eq. (3.24).

∑ ∑ || ( )
Ns w×h
( )||
Ldepth = |I i p j − Î i p j | (3.24)
i=1 j=1

( )
where I i p j is the normalized pixel value at position p j in the source frame and
( )
Î i p j is the pixel value mapped to the estimated frame after transformation. N s
is the number of pixels, w × h is the number of source frames. Researchers can
recover depth by directly providing depth truth maps or pose labels for the supervised
signal during the training step, by introducing binocular vision to calculate parallax
or by adding horizontal ground assumptions to recover scale factors. In addition,
researchers generally add masks at dynamic targets to reduce the interference to
Eq. (3.24) and introduce smoothing losses at target edges to improve the depth
estimation results.
One more straightforward method of monocular video detection, Kinematic-3D
[23], avoids constructing motion estimation equations. It accomplishes sensing in
autonomous driving through outdoor monocular video detection, which is trans-
ferred from indoor video stream localization and mapping. It constructs observation
equations, velocity states, and self-vehicle motion supervised estimation, which is
updated by using Kalman filtering. In addition to the 3D object detection at a single
frame, the results from the preceding and succeeding frames are also combined to
form multiple observations, self-vehicle pose transformation is supervised learned
as the motion data, and then the Kalman gain is calculated. The observation accuracy
is improved using the 3D Kalman filter, updating the result of 3D object detection at
this frame.

P 't = F t−1 P t−1 F t−1


T
+ I (1 − μt−1 ) (3.25)

where F t−1 is the state transition model at t−1. P is the covariance matrix with 3D
confidence μ, as[ P = I (1 − μ)λ0 , where ] I denotes the identity matrix, the state
variable
[ order
] is τ x τ y τ z τw τ h τl τθ τ h τv , and λ0 is an uncertainty weighting factor.
τx τ y τz is the location information, [τw τh τl ] represents the scale and [τθ τh ] is the
orientation. τv is the velocity, and μ is the average self-balancing confidence.
194 3 Dynamic Surveying in Autonomous Driving

The Kalman gain is:


[ ]−1
K = P' H H P ' H + I (1 − μ)λo
T T
(3.26)

H is the truncated identity map, and then


( )
τt = τt' + K b − Hτt'
(3.27)
P t = (I − K H) P 't

where τt is the state quantity of the tracking trajectory, and P is the updated
observation, which can be regarded as a typical Kalman update process.
3. Stereo 3D object detection
Binocular cameras are analogous to the two human eyes, providing us with a sense
of depth that cannot be obtained with one eye. The baseline length of two cameras
is called the base distance. A greater depth can be detected when increasing the
baseline. The core of binocular target detection lies in obtaining the disparity map,
which is the difference between the left and right images. If the two cameras are
placed parallel at the same height, there is a visual disparity between the two only
in the horizontal direction. Figure 3.24 illustrates the principle of binocular depth
estimation.
In the binocular model, the imaging plane is at one side of the two cameras, going
through the target. The left and right cameras are both pinhole models. Their optical
centers, Ol , and Or are located on the same horizon, and the distance between them
is defined as baseline (b). The target, P, is captured by the left and right cameras and
projected on the image plane as Pl and Pr , which are of different image coordinates
due to the baseline between the optical centers of the cameras. This difference is
termed disparity. There is only a horizontal disparity since the two camera centers
are on the same horizon. If the x-coordinate of the PL is u l , ur for Pr , then Eqs. (3.28)
and (3.29) are held according to the similar triangle relationship between ∆PPl Pr
and ∆POl Or.

Fig. 3.24 The principle of stereo vision-based depth measurement


3.3 Object Detection in Autonomous Driving 195

Fig. 3.25 The framework of 3D object detection with a stereo camera

Z− f b − ul + ur
= (3.28)
Z b

Z = fdb
(3.29)
d = ul − ur

where f is the camera focal length and d denotes the difference between the left
and right image pixels, which is the disparity in pixels. The depth of the target
can be obtained according to Eq. (3.29), which indicates that an accurate d is key
to obtaining the object depth. A typical binocular 3D detection usually introduces
the disparity map as one feature at the cost of increasing the inference computing
time. After obtaining the disparity map, a pseudo point cloud can be generated by
using the corresponding relationship between the 2D-3D image and then solved by
the point cloud strategy. Current binocular 3D object detection strategies strive to
reduce computing time and cost while maintaining relatively high detection accuracy.
Stereo-RCNN [24] is a typical binocular object detection method influenced by the
binocular disparity strategy. The model contains a disparity estimation module that
regresses the disparity map between two images. The target depth can be obtained by
using the known parameters of the camera, and the learning of the heading angle and
the scale information is consistent with the inference process introduced in monocular
3D detection (Fig. 3.25).

3.3.3 LiDAR-Based 3D Object Detection

LiDAR-based 3D object detection refers to restoring the 3D properties of a target


using a point cloud obtained by LiDAR. The point cloud contains the spatial location
of all valid points. If there are valid points on the object’s surface, then the object’s
position can be determined. Therefore, 3D object detection based on the point cloud
has better performance in terms of accuracy than visual 3D detection.
Point cloud object detection poses some deficiencies and difficulties. First, the
data volume of the point cloud is enormous. There can be tens of thousands of points
in one frame. The first difficulty is how to reduce the information loss while saving
196 3 Dynamic Surveying in Autonomous Driving

computation and time costs. Point clouds are discrete and sparse, and it is difficult to
distinguish the foreground points from the background. It should also highlight how
to address the decreasing sampling rates with increasing distance.
Before introducing point cloud object detection, point cloud clustering and
segmentation are introduced, which are the main steps in point cloud processing and
the core algorithms in point cloud object detection. Point cloud clustering groups
all points that may belong to the same category into a cluster. Point cloud segmen-
tation is the process of assigning a label to each point. These two algorithms are
similar in principle. Point cloud object detection algorithms based on deep learning
are primarily extensions of segmentation algorithms, as the point cloud data are
embedded with depth information, and thus, the 3D frame can be roughly determined
after dividing each point cloud into categories. Point cloud object detection based
on deep learning can be divided into three categories: Bird’s eye view (BEV)-based,
camera view-based, and point-based methods.
1. BEV-based object detection
As implied in the name, BEV represents the point cloud features in BEV-based
object detection. The BEV map is obtained by gridding the LiDAR point cloud
and projecting it to the X–Y coordinate plane. After gridding, the resolution,
∆l × ∆w × ∆h, is set manually. Each grid corresponds to one pixel in the feature
image or feature vector. For example, if the point cloud is discretized into a voxel
space of m × n × ∆h, every voxel corresponds to one image pixel. The higher the
resolution is, the higher the accuracy and the higher the computational cost, and vice
versa.
The structural design of this type of detection network evolved from point cloud
segmentation frameworks, such as PointNet and PointNet++, which fit the detec-
tion task at the input or output side, and its overall architecture needs to balance
performance and efficiency. Taking the classic VoxelNet [25] and PointPillar [26] as
examples, VoxelNet first divides the 3D point cloud into a certain number of voxels,
then normalizes the point cloud, and subsequently applies local feature extraction
using several voxel feature encoding (VFE) layers to each nonempty voxel to obtain
voxel-wise features. The voxel-wise features are further abstracted by 3D convolu-
tional middle layers (increasing the receptive field and learning the geometric spatial
representation), and finally, the object is detected and classified using a RPN with
position regression. The 7 DoF deviation of its regression is as follows (only the yaw
angle of the target is considered).
g g g
xc − xca yc − yca z c − z ca
∆x = a
, ∆y = a
, ∆z = (3.30)
d d ha
lg wg hg
∆l = log , ∆w = log , ∆h = log (3.31)
la wa ha

∆θ = θ g − θ a (3.32)
3.3 Object Detection in Autonomous Driving 197
( g g g g g g g)
where ( axc ,a yc ,a z ca, l ,aw ,a h a,)θ is the true value of the dimension to be regressed,
and xc , yc , z c , l , w , h , θ is the preset value of the anchor, in which d a =

(l a )2 + (wa )2 .
The overall classification and regression loss function is shown in Eq. (3.33).

1 ∑
Npos
( pos ) 1 ∑
Nneg ( )
neg
L=α L cls pi , 1 + β L cls p j , 0
Npos i=1 Nneg j=1

1 ∑
Npos
( )
+ L reg u i , u i∗ (3.33)
Npos i=1

where u i , and u i∗ are the estimated and true values of the above seven biases. pi and
pos
neg
p j represent the softmax output for positive/negative anchors. α and β are positive
constants balancing the relative importance. Npos and Nneg are the numbers of positive
and negative samples, respectively. L cls is the classification loss calculation function,
which is generally the cross-entropy loss, and L reg is generally the L1 loss or L2 loss.
PointPillar is modeled after VoxelNet, with modifications to VoxelNet, such
as reducing VFE layers and removing 3D convolution operations, resulting in a
tenfold-fold increase in the network speed. The PointPillar framework is illustrated
in Fig. 3.26.
Center3D [27] is a one-stage, anchor-free, NMS-free 3D object detection network.
It uses anchor-free to deal with voxels on the PointPillar structure and proposes an
auxiliary corner point attention module to force the CNN to pay more attention to
object boundaries.

Fig. 3.26 The framework of PointPillar


198 3 Dynamic Surveying in Autonomous Driving

In general, it seems that BEV-based object detection is influenced by the image


object detection method to accomplish classification and regression using the high-
dimensional feature map after feature extraction, and its feature extraction is accom-
plished by maximizing the local features using the point cloud segmentation strategy.
One disadvantage is that the accuracy is degraded as the point cloud data lose due to
voxel segmentation.
2. Camera view-based object detection
The camera view is composed of multiple mapping lines from the scan, forming a
range view image. The image height is the line number of the LiDAR beams, and the
image width is the point number of one-time scanning. For example, for a 64-beam
LiDAR, the horizontal angular resolution is 0.2°, and the size of the camera view is
641,800. The camera view is much smaller than the BEV, so the camera view-based
method is more efficient than the BEV-based method. Point cloud information is lost
in results based on camera view, in which the signal blockage and truncation are
hard to determine. Therefore, few methods use a camera view for 3D dynamic object
detection. In early studies, the camera view of the point cloud was mainly fused with
camera images, with additional attributes and depth information. In data learning-
based strategies, the camera view is matched with image data to obtain new feature
maps for post-processing. The alternative way to use the camera view is to convert
it to a depth map, which is matched with image data for subsequent processing. One
result of camera view-based object detection is shown in Fig. 3.27.
3. Point-based object detection
Point-based object detection does not use voxel gridding or sampling, but it can
reduce point cloud information loss. This type of algorithm is inspired by PointNet
[28], taking point cloud data as the direct input to downsample and extract contex-
tual features using a set abstract (SA) layer. A feature propagation (FP) layer is
then applied to the upsampled points, and the features are propagated to the points
discarded during downsampling to recover the lost information. A 3D RPN is then
applied to generate proposals centered on each point, and a final prediction is given
based on these proposals after a refinement module. Figure 3.28 illustrates a point-
based framework for point cloud target detection. This approach minimizes the loss

Fig. 3.27 Camera view-based object detection (camera view map in perceptual field)
3.3 Object Detection in Autonomous Driving 199

Fig. 3.28 The framework of object detection based on the point cloud

of raw data information, but its time consumption tends to be high due to the analysis
and inference required for each point.
Point-RCNN [29] is a two-stage point cloud detection framework. Candidate
frames are generated at the first stage, during which semantic segmentation is applied
to the point cloud, differentiating between foreground and background. Frames are
bounded with every foreground segment filtered to obtain n candidate frames. In the
second stage, every candidate frame is rotated and transformed to a proper position,
and they will be scored and optimized, resulting in the final result.
VoteNet [30] is a novel point cloud detection strategy using point-based detec-
tion. It rejects the previous idea of extending 2D structures and directly develops a
3D framework to establish a generic detection framework for point clouds. It basi-
cally follows the PointNet++ structure to reduce the information loss in point cloud
transformation and avoid point cloud data loss. Traditional point cloud processing
networks analyze surface features, while target detection requires the center of the
target. As the name implies, VoteNet uses a similar voting mechanism as that in the
Hough transform. The resultant position with the highest votes is determined to be
the closest to the center. It achieves state-of-the-art 3D target detection performance
in both SUN RGB-D [31] and ScanNet [32] datasets.
In general, deep learning approaches for LiDAR adopt frameworks and ideas from
vision (two-stage, anchor design, and dual-head structure) and form their unique
processing methods, including point cloud clustering strategies based on PointNet
structures. The data characteristics of point clouds lead to more difficulties in training
these frameworks than in vision training.

3.3.4 Vision and LiDAR Fusion Object Detection

Point clouds and images are two representations of the object, and they can comple-
ment each other and be fused to obtain a more accurate detection. One generic fusion
strategy is to find the feature representation common to both sensors. These strategies
include raw data direct fusion, mapping point clouds into picture space for feature
fusion, mapping point clouds into BEV space for feature fusion, generating pseudo
point clouds with pictures for feature fusion with the point cloud, and filtering fusion
of target results.
200 3 Dynamic Surveying in Autonomous Driving

Raw data direction fusion, also known as pre-fusion, is the direct fusion of raw data
after simple filtering and normalization operations. In this strategy, the systematic
errors of different data are input into the network, which is expected to minimize
the errors. This raw data direct fusion requires a large amount of data and complex
training but not much human assistance.
Target-level fusion combines the results of two parallel decoders into one, and
this strategy is widely used in scenarios requiring high robustness, such as military
radar detection and lunar exploration. This fusion strategy requires the least amount
of joint data and network training or even no training. However, it has the highest
information loss and requires human assistance to determine the final result.
Feature-level fusion combines the advantages of pre-fusion and target-level
fusion. This fusion uses as many different sensor data as possible and reduces the
amount of information loss with less difficulty in network training. According to
different methods for object detection based on the point cloud, the fusion plane can
be a camera view or BEV.
The method related to camera view emerged earlier than the method related to
BEV. It was proposed to provide depth information but lose semantic information,
which can be made up by matching image data. It is simple, robust, and can be
deployed on lunar rovers and geological exploration equipment with lasers and
cameras onboard. However, images and laser point clouds are intrinsically different
data. It has been found that converting certain sensor data into another format causes a
loss of advantages of that sensor data, resulting in the inability to overcome obscured
targets and avoid systematic errors in fusion.
Fusion in a bird’s-eye view is most common in recent studies. This fusion includes
algorithms introducing deep learning strategies, such as multi-view 3D network
(MV3D) [33], aggregate view object detection (AVOD) [34], and multi-task multi-
sensor fusion (MMF) [35]. In MV3D, the 3D object is predicted using multimodal
inputs obtained from sensors, and a 3D proposal (candidate region) is selected based
on the BEV features output from the point cloud. Then, different features are extracted
from these candidate frames, which are input into a deep fusion network after ROI
extraction. The deep fusion network achieves as much interactive fusion as possible,
which is reported to be optimal compared to pre-fusion and post-fusion. In this fusion
network, the computational cost can be minimized, but it is constrained by the idea
of the proposal. The point cloud data need to be processed in two processes, and in
addition, NMS needs to be performed, which leads to a high computational time cost
and upper limit of accuracy. Figure 3.29 shows the main structure of feature-level
fusion.
AVOD [1], enhanced from MV3D, is a bird’s-eye view approach based on a deep
learning strategy. AVOD is a representative work of dynamic object surveying in
autonomous driving, and it is used as a reference in many follow-on studies. AVOD is
an object detection network under multiple views that integrates the features extracted
from images and BEVs of the point cloud. In contrast to MV3D, the FPN network is
added in AVOD to achieve full-resolution feature maps, which are resized to extract
corresponding features. It removes the input of the LiDAR front (point cloud mapping
to the front view of the camera view) and integrates multimodal in the generated
3.3 Object Detection in Autonomous Driving 201

Fig. 3.29 Feature level based fusion on a visual camera and LiDAR

proposal. AVOD uses a smaller computational space and model reduction tricks to
shorten the computational time. However, both algorithms attempt to analyze point
cloud data in BEV maps while using image features to process fusion results with
as large a correction as possible, but both rely too much on point clouds and lack
comprehensive utilization of the image information.
MMF, enhanced from AVOD, is a top-ranked deep learning-based BEV strategy in
which the LiDAR point cloud is projected on the BEV map. The framework processes
two data streams: the camera stream and the point cloud data stream. The camera
stream contains a feature extractor outputting residual feature maps into several
convolution layers of a lightweight ResNet. These convolution layers process the
sparsity of the observed data and the spatially discrete features of the camera image
and discretize each pixel of the image feature map into the BEV map, leading to
a convenient fusion between image features and the laser point cloud. MMF is the
first fusion algorithm to project image data to BEV. In the fusion layers, the BEV
feature map is clustered using the nearest neighbor algorithm, and then the clustered
features are back-projected to 3D space, which is then projected to 2D image space,
where the pixels can be related to the corresponding cluster. Through this process,
image pixels are projected to a BEV feature. Currently, this algorithm can achieve
high accuracy but has several problems.
(1) The extraction of 2D features uses a Faster-RCNN network, which involves a
relatively complex training process.
(2) The inference process requires NMS operations, resulting in additional costs.
(3) The fusion process is more difficult to understand, and the process is left entirely
to the network.
(4) Whether the information in the images is lost in the BEV map needs to be
explored by advanced research. In addition, this framework contains a problem
that the fusion network itself does not provide an estimation of directional
attributes, so additional estimation units need to be introduced.
Numerous researchers prefer to use a BEV-based fusion strategy with the integration
of deep neural networks. On the one hand, point cloud information can be maximally
202 3 Dynamic Surveying in Autonomous Driving

preserved in the BEV. On the other hand, feature-level fusion compromises efficiency
and information loss.
The other strategy is point cloud feature fusion with a pseudo point cloud generated
from the image through a cone of a view model. A 3D surveying result is obtained
by applying point-based or voxel-based segmentation to a fused point cloud between
the pseudo point cloud and a true point cloud from LiDAR. This strategy is to convert
the image data into the 3D space with the point cloud data without degrading the 3D
point cloud to 2D. However, errors can be introduced during the conversion from the
camera image to the pseudo point cloud.
In autonomous driving, researchers are dedicated to achieving more accurate
measurements, lower error and failure rates, and more versatility to different weather
and road conditions of the autonomous driving system by adopting a multi-sensor
fusion strategy. Researchers and engineers have found it difficult to achieve an
optimal fusion strategy that can deal with all scenarios. Different fusion strate-
gies are developed in accordance with many practical factors, such as the available
sensors, driving environment, and available computing power. It should be noted
that there are cases where the result of multi-sensor fusion is not necessarily better
than that of one single sensor, which can result from matching errors between data
from different sensors. Therefore, the following studies are increasingly focused on
avoiding the matching errors and inherent errors of different sensing systems and
reducing computation complexity and time.

3.4 High-Definition Map

3.4.1 HD Map Standard for Autonomous Driving

A HD map is a map with high accuracy and the ability to show various traffic elements.
The rich and accurate prior knowledge in the HD map extends the function of the
in-vehicle sensors, assisting collaborative perception and localization. The HD map
is considered a fundamental core module in an autonomous driving system.
Autonomous driving of different levels has different requirements for the accuracy
and content of the HD map. There is no official standard in the industry to define what
kind of HD map is needed for specific applications. Different HD map providers have
their production standards. Currently, in the HD map industry, there are six evaluation
factors: relative accuracy, logical consistency, data correctness, point cloud attributes,
storage format, and element completeness.
(1) Relative accuracy: Unlike coordinate surveying, which requires absolute preci-
sion, localization in autonomous driving does not necessarily need absolute
coordinates, such as longitude and latitude. It is the relative position, as the
relative position of the building to the car, that matters. Therefore, relative accu-
racy is more commonly used for the localization criterion. There is a consensus
that the horizontal relative accuracy should be within 10 cm, and the vertical
3.4 High-Definition Map 203

relative accuracy should be higher than 20 cm when there is no signal loss.


However, there is no official or industrial standard for this accuracy. Indi-
vidual manufacturers usually set the required accuracy according to specific
applications.
(2) Logical consistency: logical consistency requires that the exact location is
projected the same on the map without any overlap.
(3) Data Correctness: Data correctness requires that short-term static objects, such
as temporarily parked vehicles and non-fixed obstacles, be removed, ensuring
a clean road surface of the HD map.
(4) Point cloud attributes: The HD map requires that the point cloud attributes
should not be too simple to allow the text and content information to be read
and to ensure the map’s accuracy.
(5) Storage format: Due to the limited storage capacity of the vehicle, a higher
compression ratio of the HD map is preferred, which means that autonomous
driving vehicles can run farther and have a better active planning capability.
(6) Element completeness: Element completeness requires that the HD map has
enough vector elements to meet the needs of autonomous driving.
The HD map (Fig. 3.30) is generally divided into static and dynamic map layers in the
industry. The former needs to be loaded in advance, while the latter is continuously
updated during the driving process. The division of layers ensures that the high-
precision map maintains the accuracy of the underlying lane data. At the same time,
it also ensures that the high-precision map can carry out real-time updates of dynamic
road information.

Fig. 3.30 One example for the HD map


204 3 Dynamic Surveying in Autonomous Driving

The static map layer is the basis and skeleton of the HD map. Compared with
general electronic maps, a static map layer used in autonomous driving needs to
accurately record the lanes, accessory facilities, and parking areas. Driven by car
manufacturers and autonomous driving associations, numerous specifications have
been published to standardize the static map layer and define the map data and
structures, including the Open Lane Model published by the International Navigation
Data Standard Association (NDS-Association), the intelligent driving electronic map
data model and exchange format defined in Apollo OpenDRIVE and EMG HD
Map by the German Technical Committee for the Standardization of Intelligent
Transportation Systems.
The static map layer used in autonomous driving can be divided into one traditional
standard and one additional map layer with lane-level information. According to the
importance of different map components to autonomous driving, map layers can be
divided into road networks, lanes, road markings, and road facilities.
(1) Road networks. This map layer is based on the road layers in the traditional
2D map and is added with 3D information to depict the geometry of the high-
precision roads. The road network consists of road lines, joints, and junctions.
This layer also contains data models to describe the logical relationship between
roads, lanes, and related road facilities.
(2) Lanes. This layer describes the geometric and logical relationship between
different lanes. The lane network is composed of lane lines, lane joints, etc.
The same lane in different road segments is connected with the land joints,
forming one lane unit in path planning. On the other hand, different lanes from
the same road segment are changeable, as the car should be able to transit from
one to the adjacent one.
(3) Road markings and facilities. This layer contains traffic facilities that assist safe
and intelligent driving, which cannot be described in road and lane networks.
Road subsidiary objects include road marks, such as longitudinal road marks,
transverse road marks, parking space marks, and road facilities, such as signals,
street lamps, and various types of traffic signs.
The static map layer is further divided into vector and feature layers to achieve effi-
cient storage and readability. The feature layer accurately portrays road information,
mainly driven by the need for high-precision localization, providing support for map-
road matching. The vector layer is a further abstraction, processing, and labeling on
top of the feature layer. It occupies smaller storage space and provides the primary
data for accurate point-to-point path planning, which is a significant role that the HD
map plays in autonomous driving.
The vector layer contains lane models, road components, and road attributes.
These semantic elements are simplified and extracted to supplement the geometric
road structure. The lane models include the contour, central line, and lane attributes,
assisting the lane-level localization and providing related traffic rules, for example,
guiding lane-changing. The localization can be improved with the vector layer by
comparing the in-vehicle perception results with road traffic signs on the map. Even
when no road features are detected, short-time reckoning can be performed with an
3.4 High-Definition Map 205

HD map. Furthermore, mathematical parameters such as curvature, slope, heading,


and cross slope of the lane model can guide car maneuvers such as steering, braking,
and climbing. The lane model needs to be updated due to road renovation and
maintenance using in-vehicle intelligent sensors.
A static map layer is sufficient for determining the car positions, but it cannot
describe the real-time road situation. A dynamic map layer is necessary to ensure
the safety, smoothness, and conformability of autonomous driving, providing extra
dynamic information that cannot be offered by in-vehicle sensors. The dynamic HD
map layer is added to the static map layer, with dynamic traffic information related
to traffic jams, accidents, road construction, and weather conditions. This dynamic
information is collected, processed, and released between vehicles and edge servers
through 5G communications and is updated in real time in the dynamic layer of
the HD map, ensuring safe autonomous driving. The dynamic map layer includes
various data types. Here, it is roughly divided into direct perception data and internet
of vehicles (IoV) data, which are introduced as follows.
(1) Direct perception data mainly refers to the data acquired by autonomous driving
cars, including localization data and perception data.
(2) IoV data mainly refers to information related to real-time road conditions
accessed by vehicles through the IoV. These data can complement the in-vehicle
perception, providing states of the static objects and trajectory of moving objects
near the car and assisting driving decisions. Traffic restrictions and flow infor-
mation on the driving route can be assessed via the IoV, which provides a basis
for car path planning.

3.4.2 Production of the HD Map for Autonomous Driving

In the previous section, the HD map model was introduced. In this section, the
production of the HD map will be presented (Fig. 3.31).

1. Crowdsourcing collection of spatial and temporal data

Professional HD map manufacturers are currently equipped with hundreds of profes-


sional cars for data collection, as map data collection is a massive project. The car
fleet for data collection not only ensures the source of primary map data but also
keeps the map up-to-date. Real-time collection of sufficient road data in a large area

Fig. 3.31 The production of the HD map


206 3 Dynamic Surveying in Autonomous Driving

is usually needed for periodic HD map updates. This task is difficult to complete
merely by relying on conventional surveying and mapping equipment.
To tackle the difficulty of acquiring real-time information of broad coverage,
researchers have proposed crowdsourcing data collection, which is gradually
accepted in HD map production and recognized as a future format of HD map
production in the industry. Crowdsourcing, as a distributed task allocation and execu-
tion mechanism of collective intelligence, can assign tasks that used to be given to
specific individuals for centralized completion to nonspecific people in the form of
outsourcing and use group intelligence to complete complex tasks at a lower cost.
Collective intelligence is an important trend in the era of big data. With the rapid
development of China’s Internet technology, Beidou satellite positioning commu-
nication technology, and the widespread use of intelligent terminals, a considerable
amount of spatial and temporal data for road mapping has been generated. As a typical
version of this new “collective intelligence” mode, crowdsourcing involves public
users in map data collection, providing the possibility of acquiring high-precision
road information with extensive coverage.
Companies that enter the mapping industry as autonomous driving developers rely
on the “crowdsourced maps” produced by distributed and cloud servers with algo-
rithms to process the crowdsourced images and fuse them to the HD map. One obvious
advantage of crowdsourced maps is that the discrete data collection of a single car
can be overcome by using the large number of sensors onboard the distributed vehi-
cles running on roads. Therefore, the bottleneck limiting the mapping speed is no
longer the data collection speed but a large amount of data processing at the back
end. Crowdsourcing mapping is obviously of meager cost and has a high update
frequency.
Taking road information extraction from crowdsourcing trajectories as an
example, it usually consists of two primary steps: data cleaning and information
extraction. Due to the unprofessional nature and variety of user terminals, crowd-
sourced data may contain significant spatial measurement errors, which can be elim-
inated by data cleaning. Generally, hierarchical cleaning is performed by applying
specific point density and motion consistency criteria using the probability distri-
bution pattern of the trajectory about the road cross-section. In road information
extraction, the Gaussian mixture model can be used to model the distribution pattern
of the car trajectory data within the road cross-section to restore the road line’s
centerline. In addition, the topological and temporal relationships between the adja-
cent samples of the trajectory data reflect the speed limit, steering, and other lane
information. Therefore, road connectivity, channelization, and speed limits can be
extracted by applying trajectory tracking, fuzzy regression, and other methods.
Compared to professional road surveying, the crowdsourcing mode makes most
commercial devices and sensors in-vehicle, avoiding investment in professional road
data collection equipment. In the crowdsourcing mode, collected data are uploaded
by and assigned to ordinary users via communication networks, enormously reducing
the involved professionals. Meanwhile, the HD map is periodically updated with data
collected within a large area. The update frequency can be improved mainly with
the lowered quality requirement of one observation since the final result depends
3.4 High-Definition Map 207

on the overall result obtained by applying extensive data analysis based on statis-
tical principles rather than one observation. In addition, crowdsourced data, such as
the spatiotemporal trajectory, are essentially time–space sequences representing the
motion of the targets. It is possible to obtain the speed and direction of the targets
based on these data, making up for the shortcomings that professional road mapping
cannot obtain real-time traffic information.
Moreover, crowdsourcing data collection can be carried out in industrial, regional,
and worldwide collaborations, complementing the shortcomings of professional
road mapping that cannot work at a large scale. However, opportunities always go
along with challenges. Although the crowdsourcing mapping mode has dramatically
improved the timelessness of the HD map, it also introduces low-quality data, which
are hard to apply directly in practical applications and thus call for improvement of
big data theory and methods.
2. Preprocessing and reconstruction of the map data
Data processing is the process of sorting, classifying, and cleaning source data to
obtain an initial map template without semantic information or element annotation.
The initial map template is also termed the map basis.
The effectiveness of map basis construction is closely related to the accuracy
of the trajectory. During the production of the map basis, multi-sensor fusion is
usually used to acquire the localization information by fusing GPS, IMU, and wheel
speedometer measurements and environmental information by combining LiDAR
and camera observations. Based on the SLAM principle and method, the acquired
trajectory is corrected and converted to a unified world coordinate system to realize
the 3D reconstruction of road scenes.
Most solutions of map basis production using multi-sensor fusion mainly rely on
LiDAR combined with a navigation unit to achieve high-accuracy localization and
mapping. LiDAR can provide detailed 3D spatial structure but not as rich semantic
and textural information as images. Therefore, some solutions adopt the fusion of
cameras and LiDAR to combine 3D structural information with rich image informa-
tion. There are also solutions merely using cameras, generating a map basis in real
time through online detection of key road elements, including lane lines and road
signs. Achievement of a map basis with high definition relies largely on advanced
computer vision algorithms.
3. Essential map elements recognition and information extraction
Object detection in producing HD maps refers to detecting and classifying static
road elements, including ground signs, road signs, traffic lights, and street lamps.
The automatically detected results are upstream of the HD map producing chain,
providing vital information on the HD map.
The ground signs in the HD map include ground arrows, characters, numbers,
speed-limit zones and bumps, zebra crossing, stop and give away lines, distance
confirmation lines, and deceleration yield lines. The variety and worn-out signs cause
difficulty in in-ground sign recognition. The direct ways to extract the ground signs
from a point cloud are threshold segmentation, skeleton extraction, and connected
208 3 Dynamic Surveying in Autonomous Driving

domain analysis. These methods are effective when the foreground and background
are of significant contrast. They usually fail to obtain a satisfactory result when the
signs are blurred or worn out and the background is complex.
In this case, some signs can be missed in the detection, resulting in insufficient
accuracy, which can hardly meet the requirement of HD map production. Deep
learning methods have been increasingly used for object detection and pattern recog-
nition in recent years. This method based on deep learning has also become the main-
stream of map element recognition. Detection and recognition technology is mainly
divided into two-stage detection, single-stage detection, and anchor-free detection.
Two-stage detection achieves an excellent overall effect and higher position recogni-
tion accuracy than single-stage detection. These methods are introduced in Sect. 3.3
for object detection of the driving environment, which requires a high detection speed.
However, high accuracy is of greater concern in off-line map production; thus, the
two-stage detection is usually used. Similarly, deep learning in these methods can
significantly improve the recognition of automatic map elements, such as traffic
lights, road signs, and street lamps. Correspondingly, the recall rate and location
accuracy of map element recognition can be improved, resulting in a better HD map
service for autonomous driving.
4. Crowdsourcing update based on edge computing
Compared with traditional maps, which are usually updated at a very low frequency,
dynamic and static layers of HD maps need to be updated at a very high frequency
to keep updated with changes in the driving environment. According to the update
frequency, the HD map used in autonomous driving can be divided into four cate-
gories, i.e., permanent static (update frequency of one month), semipermanent static
(update frequency of one hour), semi-dynamic (update frequency of one minute),
and dynamic (update frequency of one second). Mapping production is no longer the
focus of the current mapping industry. Major industrial giants are competing in the
following maintenance and updates of the HD map, which raise challenges of map
production in turn.
In crowdsourcing mode, ordinary cars running on roads are assigned the tasks of
sensing environmental change and comparing new updates with the elements in the
HD map. Changed map elements are uploaded to the cloud server, which broadcasts
the updated HD map to other cars. Crowdsourcing and broadcasting can realize a
rapid update of map data.
However, a balance between real-time data updates and effective data processing
is a massive challenge due to the complexity, diversity, and large volume of map
data. The limited computing power of a single terminal makes it difficult to achieve
real-time processing of a large amount of data. Centralized big data processing based
on the cloud computing model fails when the cloud servers are overloaded. A new
computing model needs to be introduced to ensure crowdsourced real-time updates
of the HD map.
As implied in the name, edge computing refers to the computing model deployed
at the network’s edge. Edge refers to computing and network resources on the path
between the data source and the cloud computing center. Edge computing alleviates
3.4 High-Definition Map 209

the computational burden by transferring some computing tasks from the central
cloud to network edges, providing shorter delay, higher reliability, and higher band-
width that cannot be guaranteed by current cloud computing technology. Combining
edge computing and cloud computing, an elastic cloud computing platform based
on edge infrastructure, namely, edge clouds, has emerged. As an extension of the
central cloud, the edge cloud has the capabilities of computing, communication, and
storage at the edge of the network. It forms an end-to-end technical architecture of
“cloud-edge-terminal collaboration” with the central cloud and terminals.
The three-tier computing architecture of the “terminal-edge cloud-center cloud”
will provide the basic infrastructure for the crowdsourcing update of HD map and
related map services, combining the large capacity of centralized processing and the
information utility of distributed processing. A data service center for HD map can
be established in the central cloud, focusing on global, long-term, and centralized
tasks, such as big data analysis and mining, map generation and incremental updates,
driving behavior analysis, and global crowdsourcing task distribution. Regional map
service nodes are established in the edge cloud, focusing on regional, real-time,
short-term, multi-source, and heterogeneous functions, such as real-time processing
and visualization of regional data and accident early warning. Edge clouds are inter-
connected to achieve efficient task scheduling according to a regional area, task
priority, and computing capability. They are connected to the central cloud through
the Internet, realizing collaboration in data processing, data security, task assign-
ment, and task management. Terminals, including in-vehicle units, mobile devices,
and roadside facilities, play a role in essential data acquisition and processing.
Each terminal carries out initial data acquisition, alignment, and processing, such
as feature recognition and map element vectorization. Top-level tasks are coopera-
tively completed by all terminals, with which real-time information and services are
accomplished in a crowdsourcing mode.

3.4.3 Applications of HD Map in Autonomous Driving

1. Localization and navigation

Autonomous driving is one of the most important application scenarios of HD


map, which lays the foundation for accurate environmental perception and an effec-
tive guide for the safe driving of vehicles. Therefore, research on the functional
application of HD map includes localization and navigation.
Localization methods based on HD map are different due to different input data
types, but they follow the same principle. First, the car position is estimated by using
GNSS positioning. Then, sensors on the car are used to collect the environmental
information, which is matched with map information to refine the localization further.
According to different data types, localization methods are mainly divided into point
cloud-based and feature-based methods.
210 3 Dynamic Surveying in Autonomous Driving

Point-cloud-based localization matches the point cloud collected at the current


moment with the map point cloud to determine the car’s position on the map. It is one
of the most commonly used localization methods at present. Matching algorithms
include geometric matching, Gaussian mixture model-based matching, and filter-
based matching. The initial point cloud usually needs to be preprocessed to be a
gridded format to reduce the data storage.
Feature-based localization refers to localization methods that take real-time
collected images as the primary data source and use features (vectors, feature points,
etc.) to match different datasets. Automatic registration methods can be applied to
HD map acquired at other moments or by various devices.
As the “road memory” for intelligent cars, an HD map with GNSS/INS integrated
positioning greatly supports accurate and real-time localization and environmental
perception services in autonomous driving. The industrial demands of this field will,
in turn, drive the development of HD mapping technology and its integration with
localization technology.
2. Assistant perception
Generally, perception in autonomous driving refers to sensing and detecting
surrounding objects using radars, cameras, and positioning sensors deployed on
the vehicle. The HD map portrays the static features and acts as an extension of
the perceptual sensors on the vehicle. It assists in-vehicle sensors in interpreting the
environmental datasets better. Combining the HD map with real-time sensor data
realizes an improved perception ability of autonomous driving cars. The benefits of
assistant perception of the HD map are listed as follows (Fig. 3.32).
First, the HD map provides semantic information about static objects during
autonomous driving, making up for the poor detection performance of visual sensors.
In good weather, the visual sensor has an ideal sight distance, allowing detection of

Fig. 3.32 Perception assisted by an HD map


3.5 Applications 211

the speed limit sign. However, the visual sensor cannot obtain semantic level infor-
mation such as the limit road section, which is recorded in detail in the map database.
Accurate information in the HD map makes vehicle control free from the interference
of error information and improves the reliability of autonomous driving.
Second, the large amount of prior and static road information recorded in the HD
map alleviates the workload of the frontend multi-sensor data fusion, which requires a
large amount of data transfer and the high computation capacity of the core processor.
HD map record much prior information related to roads, such as traffic facilities,
lane lines, slopes, curvature, heading, height, and width limits, avoiding the need to
process these static objects, limiting the focus of the frontend sensors to dynamic
objects, and reducing fusion complexity and required computing resources of the
perceptual data, which enables real-time perception during autonomous driving.
Third, information redundancy provided by the HD map increases the safety
of autonomous driving. It is generally regarded that concise information input can
ensure prompt feedback from the application system. However, for autonomous
driving systems, vehicles must be driven on the road on a preset path and follow traffic
rules, which requires safe machine-controlled car maneuvers such as controlling,
turning, and braking without any driver control. In this case, the redundant data
input strategy is necessary. Map data provide a real-world model of the external
traffic environment and redundant data to the perceptual data fusion, ensuring a high
perception and security of autonomous driving.

3.5 Applications

With the rapid construction of new infrastructure, including 5G networks and big
data centers, many countries have put forward related policies encouraging the devel-
opment of autonomous driving. In the industry, scientists and engineers have reached
a consensus on the four-stage development of autonomous driving: low-speed load,
low-speed crewed, high-speed load, and high-speed crewed. Due to limitations in
data storage, technology, and policy, currently, the highest level 4 (L4) autonomous
driving is only allowed in the operational design domain (ODD), which means that
the autonomous driving system can only be activated under certain conditions and
within a particular region to guarantee the safety of autonomous driving within
a safe environment. These conditions include weather conditions, road conditions
(open road, half-closed road, and closed road), vehicle conditions (passenger car,
specific environment car, and closed park car), and limited speed. This section will
introduce typical applications of our proposed theoretical approaches in a partic-
ular environment (mining area) and closed area (park) and related difficulties and
challenges.
212 3 Dynamic Surveying in Autonomous Driving

3.5.1 Application in Open-Pit Mines

1. Present challenges

The mining industry is a high-risk industry with high casualties. However, open-pit
mines are ideal for experimenting with driverless vehicles. On the one hand, the harsh
mining environment poses a significant danger to miners’ safety, and recruitment of
miners becomes difficult as the mining sites are remote from resident areas. On the
other hand, mining operations are often repetitive and of a single operation, which
the machine can take over quickly.
However, unlike the daily environment commonly encountered by passenger cars,
heavy rain and snow and extremely cold weather conditions can occur in mining areas
that are also very dusty. In addition, there are various obstacles, such as vehicles,
pedestrians, falling rocks, potholes, and warning signs, whose distribution has a
multiscale feature. Bumps and ruts can also occur on the road in the mining area.
The above factors pose significant challenges for the perception, localization, and
construction of HD map for driverless vehicles in mining areas.
2. Technical solutions
To solve the above problems, we propose an extreme cold adaptability scheme
for external sensors, a multi-sensor fusion localization method, and a vehicle to
everything (V2X) vehicle–road cooperative sensing system for open-pit mines.
1) The extreme cold adaptability scheme for the external sensors
To ensure the effective operation of autonomous trucks in open-pit mines in
extremely cold weather or high-temperature environments, all materials and compo-
nents deployed inside/outside the control cabin should be functional at extremely
low temperatures, especially environmental sensing devices, such as LiDAR and
millimeter-wave radar. The following schemes can be selected to ensure their
functionality.
(1) Selection of LiDAR and millimeter-wave radar with special performance.
(2) Choose well-known LiDAR and millimeter-wave radar brands or request
customized sensors from the manufacturer that meet the minimum and
maximum temperature requirements for civilian use.
(3) Protect sensors with a thermostat controller to ensure functionality in extremely
cold conditions.
Special external insulation devices can be added to the sensors, such as LiDAR and
millimeter-wave radar on driverless trucks, ensuring their basic function in extremely
cold weather. The insulation material is an anti-low-temperature electric heating
film that is widely applied in artificial satellites, space vehicles, and instruments/
instruments. The electric heating film is made of soft material with strong corrosion
resistance and a thickness of less than 1 mm. It can work in the temperature range from
− 60 to 300 °C for a long time and has high thermal efficiency, accurate temperature
control, and low thermal inertia. The surfaces of LiDAR and millimeter-wave radar,
3.5 Applications 213

except their wave emitting surfaces, are all covered with electric heating film, and
thus, their temperature can be automatically controlled by the thermostat controller.
(4) Thermal protection on the sensor to ensure the protection level and insulation
effect.
Due to the dusty environment of open-pit mines, driverless trucks are usually cleaned
periodically. The electric heating film, which is usually not waterproof, can be
affected by cleaning or rainy and snowy weather. Additional protection devices are
designed and deployed on the sensors. Insulation material can be added between the
protective device and the electric heating film to improve thermal efficiency.
2) Multi-sensor fusion localization method
To improve the localization accuracy in the mining area, we designed three posi-
tioning modes, namely, long-time assistant localization mode, short-time assistant
localization mode, and basic localization mode, and corresponding localization
schemes for three scenarios, no GPS, short-time GPS signal blockage and open
environment scenarios, according to different periods of GPS signal loss.
The long-time assistant localization mode is designed for conditions when the
GPS signal is weak or even lost for a long time, such as open-pit mines with compli-
cated working conditions or shaft mines without a GPS signal. LiDAR and UWB
cooperative localization are designed according to the characteristics of this posi-
tioning mode. The principle of this method is as follows: First, a topological map is
established by UWB, and rough positioning is performed to obtain an approximate
location of the vehicle in the mining area. Second, laser SLAM is used for precise
positioning. Point cloud matching, convex optimization, topology graph building,
and other techniques are used to establish segmented maps and levels of the mining
environment. Finally, the laser SLAM and UWB localization results are fused to
achieve accurate positioning in no GPS or weak GPS signal environments such as
tunnels, open-pit mines, and shaft mines.
The short-time assistant localization mode is mainly used for short-time GPS
failure when vehicles pass through tunnels. This positioning mode is mainly accom-
plished by combining inertial positioning and a wheel speedometer. The combined
inertial positioning is used to estimate the vehicle position at the next moment, and
the wheel speedometer is used to obtain the vehicle’s current speed, which assists
the former in calculating the next moment’s position. In this positioning mode, the
vehicle position can be efficiently obtained in a short period, but the positioning error
increases with time.
The basic positioning mode is used in the environment with a good GPS signal.
Positioning is achieved by combining the HD map, inertial positioning, and RTK-
GPS differential positioning system. RTK-GPS is a conventional positioning method,
but the positioning error is approximately 1–2 m due to signal block by cloud cover
and path obstacles. The significant positioning error is not allowed in autonomous
driving; therefore, the RTK-GPS positioning results are corrected using combined
inertial positioning and the HD map to limit the positioning error to less than 10 cm.
214 3 Dynamic Surveying in Autonomous Driving

The abovementioned positioning modes cover all the conditions encountered


in the mining area. The proposed solutions have been widely applied in several
uncrewed mining areas. For example, a lateral positioning accuracy of < 10 cm, a
longitudinal positioning accuracy of < 10 cm, an initial positioning output of < 10 s,
and a positioning module output frequency of ≥ 100 Hz were achieved by using the
solutions in Baoli mining, ensuring the safe driving of mining trucks.
3) V2X cooperative perception and positioning system
In the V2X cooperative perception and positioning system used in the mining area,
the real-time road information is obtained by sensors, such as LiDAR and cameras,
set up on the roadside, and shared with vehicles and the control center by using V2X
communication technology, realizing functions including dynamic traffic informa-
tion collection and fusion and early warning of road hazards, thus enabling the super
sight distance perception of the vehicles. The core technologies are listed as follows.
(1) HD mapping.
(2) Image object detection based on a deep neural network.
(3) 3D object detection based on multi-sensor fusion perception.
(4) 3D object detection based on the deep neural network.
(5) Vehicle–road cooperative target and tracking.
The roadside units of V2X include sensing devices such as LiDARs and cameras, an
edge computing platform for perception fusion, communication devices such as road-
side units (RSU), devices connected to other signal machines, and a power supply.
Both roadside units and vehicle side units can transmit the sensed data to the control
center in real time through 5G/C-V2X and other communication technologies. Road
object detection is carried out by the central server, which summarizes all detected
objects within the scene and manages all the vehicles. Meanwhile, the central server
shares all information with vehicles, which expands the sight distance of all vehi-
cles and assists their autonomous driving, increasing traffic efficiency and security
(Fig. 3.33).

Fig. 3.33 Core techniques of V2X car-road synchronized sensing


3.5 Applications 215

Fig. 3.34 The application of autonomous driving

Currently, this system with mobile risers and solar power has been successfully
deployed in a mining area to test its performance. The detection speed reaches 15 fps
in the embedded edge computing platform, and the effective detection area is over
100 m.
3. Engineering applications
In the engineering application, we developed five 220-ton autonomous mining trucks
in cooperation with WAYTOUS Intelligent Technology Co. and applied them to an
open-pit mining site excavated by National Energy Group, Inc. (NEGXQ in China).
The five trucks controlled by the autonomous transportation operation management
system worked in cooperation with one WK-35 electric shovel. The fastest speed
of the driverless trucks reaches 40 km/h (33.3% higher speed than normal trucks)
with an empty load and 32 km/h with a heavy load. There is a stable and reliable 5G
network during their operation, which can fully meet the data transmission require-
ments of the uncrewed operation process. Driverless mining trucks can generally
operate at a low temperature of – 42 °C. They reached a mileage of 11,620 km and
a transportation volume of 294,733 m3 of the earth in total. The performance of
driverless mining trucks has been constantly improved and upgraded according to
feedback from industrial applications (Fig. 3.34).

3.5.2 Application in Various Parks

1. Present challenges

L4 and higher-level autonomous driving in an open road environment have been the
ultimate form pursued by researchers, engineers, and manufacturers in the industry.
However, there are still many difficulties and problems to be addressed before its
216 3 Dynamic Surveying in Autonomous Driving

Fig. 3.35 Typical urban roads for autonomous driving at the L4 level

commercial application in daily life due to ethical, legal, technical, and economic
constraints. For instance, the hardware cost is still too high for commercial produc-
tion, autonomous driving security cannot be sufficiently experimented with and
verified, and guides are still needed in-vehicle to ensure safe driving. There are
some scenarios, such as closed or half-closed parks, where low-speed autonomous
driving is allowed to be commercialized. Low-speed autonomous driving for delivery,
patrolling, and vending on campuses and parks has entered a period of rapid
development.
Low-speed autonomous driving in parks and campuses faces more demanding
challenges in mapping and positioning due to the complex environment and business
needs.
1) Unsatisfactory positioning results of coupled navigation due to severe blockage
GNSS/IMU coupled navigation and position are not functional in scenarios where
the satellite signal is unstable due to severe blockage, and the signal loss lock can be
easily activated. One example scenario is demonstrated in Figs. 3.35 and 3.36.
In this typical scenario, mapping and localization based on coupled positioning
methods are difficult to work normally due to the signal loss caused by tall buildings,
elevated overhead structures, plant covers, and so on. Therefore, it is necessary to
use a multi-sensor fused localization method based on LiDAR, cameras, and IMUs
to meet the demand for continuous positioning in practical applications.
2) Inapplicability of traditional high-precision maps due to frequent scene changes
Unlike municipal roads, the utility of park roads can be changing at a frequency up to
months or even days, and driving paths, buildings, and road facilities can be changed
according to daily changing needs. For example, Fig. 3.37 shows one port logistics
park where the container yard changes every day.
Map data collection of the HD map used in L4 autonomous driving is generally
accomplished using autonomous driving cars, each costing hundreds of thousands to
millions of RMB. Afterwards, map service providers conduct the production of the
HD map. The update frequency is at least half a year. It must contain all attributes to
meet the automated processing of complex traffic rules, which puts forward extremely
high map quality and accuracy requirements, as shown in Fig. 3.38.
3.5 Applications 217

Fig. 3.36 Path in a shopping park, which is a typical scenario for autonomous driving

Fig. 3.37 A port logistics park

This high-cost, low-frequency mapping approach is too expensive for autonomous


driving applications in half-closed or closed parks, which require low-cost hardware
systems to realize real-time online mapping without full attributes. The complexity
of parks brings increasing challenges to mapping, which is required to be adaptable
to various urban and rural areas, including campus roads, schools, squares, and semi-
indoors, and high timeliness to meet fast response on-site operation and hourly level
updating.
218 3 Dynamic Surveying in Autonomous Driving

Fig. 3.38 Conventional data collection for HD mapping

2. Technical solutions
To solve the abovementioned problems in practical applications, our team proposed
mapping and positioning technologies based on multi-sensor fusion for autonomous
driving, which performs excellently in parks and plays an essential role in positioning,
sensing, and mapping.
1) Multi-sensor fusion positioning
Multi-sensor fusion uses multi-source data input, provides a full and accurate descrip-
tion of the object, and reduces duality to improve uncrewed systems’ positioning
accuracy and robustness. For instance, multi-source fused localization technology
based on LiDAR, cameras, and IMUs provides accurate and reliable localization
information for the system. In the following, we introduce one example of this system
in detail (Fig. 3.39).
3D LiDAR accomplishes high-precision 3D scanning of the environment,
constructing a high-precision environmental point cloud model and supporting
high-precision positioning based on feature extraction and matching.

Fig. 3.39 The 3D LiDAR and an example of a point cloud


3.5 Applications 219

Fig. 3.40 Visual RGB camera and image object detection

In this system, an IMU with 9 DoF is used to provide relative attitude informa-
tion during platform motion, namely, the trajectory of the car relative to an initial
position during a certain period. The IMU can provide submeter-level positioning
accuracy that lasts for several seconds when other sensors fail, saving valuable time
for abnormal processing (Fig. 3.40).
A wide-angle RGB camera is used in this system to achieve real-time acquisition
of image data. Compared with LiDAR, the image data can be processed to obtain a
denser point cloud with richer texture information.
Fusing 3D LiDAR and visual data for SLAM can be mainly divided into four
steps:
(1) Preprocessing and feature extraction are applied to images obtained from
binocular vision and 3D point clouds obtained by LiDAR.
(2) Tracking and local mapping are conducted for binocular vision. Subsequently,
the fusion of vision and 3D LiDAR is carried out to estimate position and attitude
information.
(3) Global consistency optimization of the position and attitude map produced
by the laser point cloud is accomplished with visual loop closure detection
information to achieve full scene coverage, as shown in Fig. 3.41.
First, coarse extraction of features is performed, and a large number of feature
points are extracted by using gray value changes of the surrounding pixels. The

Fig. 3.41 Visual and LiDAR fused SLAM


220 3 Dynamic Surveying in Autonomous Driving

feature points are then filtered using a decision tree formed with the 16 pixels around
the feature point. The output optimized feature points are then processed using non-
maximum value suppression to remove dense local points. The starting frame is
inserted into the local map as the keyframe. Then, the starting frame is used as the
reference frame to match the feature points with the newly added current frame.
Subsequently, the PnP algorithm estimates the relative position and attitude between
the current frame and the reference frame. Finally, the starting frame is set as the
reference keyframe, and the relationship between the current frame and the reference
keyframe is determined. The above operations are repeated to achieve local map
tracking. The current frame’s optimal attitude can be calculated using the L-M method
by constructing a least-squares equation.
In a typical park scenario, the system can achieve stable positioning. The hori-
zontal and vertical positioning results during the system’s operation are shown in
Fig. 3.42.
The graph shows that the system can maintain centimeter-level positioning accu-
racy. The high-precision positioning results in a weak or even no GPS signal in a
park with plant cover are shown in Fig. 3.43.
2) Environment perception based on LiDAR and camera fusion
The environment perception system based on LiDAR-camera fusion can better satisfy
the demand for real-time target perception in highly dynamic scenes in parks. This
system fuses non-Euclidean spatial feature learning, semantic and instance point
cloud segmentation toward the construction of a “panoramic segmentation” of the
fused point cloud and image elements. It realizes structured extraction of target
elements with accurate geometric boundaries and correct semantic annotations.
The point cloud instance segmentation network mainly includes two parts: the 3D
region recommendation network and the point cloud classification and boundary opti-
mization network. In the 3D region recommendation network, the instance-sensitive
score map is first obtained by calculating each block’s position relationship relative
to the instance. Then, the sliding windows centered on each point are classified by
using logistic regression, obtaining the instance object score map. In the point cloud
classification and boundary optimization, the minimum outlier box of the instances is
first generated using the 3D minimum outlier box recommendation network. Subse-
quently, the instance-generated probability of each point is calculated to form the
instance probability map. Finally, each point is labeled the same as the instance item
with the maximum value in the probability map, realizing the boundary refinement
of the point cloud instances. The result of LiDAR and visual data fused perception
is shown in Fig. 3.44.
3) HD mapping
Multi-source data are acquired by using multi-sensor integration systems onboard
vehicles. For example, IMU and laser observations are used to produce a 3D point
cloud map. The IMU and camera obtained data are optimized and fused using a
continuous-time SLAM algorithm to produce a multi-feature fused map. Multi-
sensor integration dramatically improves the stability and accuracy of HD mapping.
3.5 Applications 221

Fig. 3.42 Positioning accuracies along the X-axis, Y-axis, and Z-axis

Fusing the visual SLAM algorithm and laser SLAM algorithm obtains better map
composition than the visual SLAM algorithm or laser SLAM algorithm alone. With
multi-sensor integration, low mapping cost, high timeliness, high accuracy, and rich
attributes are achieved to meet the demand for autonomous driving applications in
parks. One mapping result is shown in Fig. 3.45a–c.

3. Engineering applications

With the development of autonomous driving and artificial intelligence, various intel-
ligent automated cleaning robots have been widely applied in different indoor and
outdoor cleaning scenes. Intelligent cleaning robots can accurately map the planned
cleaning area with a quick setup of a cleaning task and efficient deployment. The
airport apron’s daily cleaning and patrolling are fundamental to maintaining safe
222 3 Dynamic Surveying in Autonomous Driving

Fig. 3.43 Positioning result based on multi-sensor fusion in one park

Fig. 3.44 LiDAR and visual data fused perception

flights, which are conducted manually at present. Manual operation is of low effi-
ciency and high labor cost. The cleaning quality and coverage are difficult to examine
and quantitatively evaluate. Automated cleaning vehicles can carry out regular or
irregular inspection, discovery, prevention, and cleaning to remove obstacles and
support systematic and digital management of airport apron cleaning, helping to
improve the safe operation of the airport.
The uncrewed intelligent cleaning system used in cleaning airport aprons is a
complex system with cleaning robot hardware, navigation control and scheduling
3.5 Applications 223

Fig. 3.45 HD mapping

subsystems, airport air traffic control subsystems, safety-assuring subsystems, and


operation and maintenance-assuring subsystems. The overall architecture of the
system is shown in Fig. 3.46.
The airport apron is a typical half-open park, one side of which is for parking large
passenger aircraft, and the other side is adjacent to buildings. Solutions using either
laser SLAM or GNSS cannot achieve stable and reliable high-precision positioning.
In this project, multi-sensor integrated navigation control technology is adopted
to design and develop an uncrewed cleaning vehicle with an electronically controlled
sweeping chassis. This system integrates 3D LiDAR, GNSS system, camera, IMU,
and other sensors to achieve stable and reliable positioning and navigation of
uncrewed cleaning vehicles on the airport apron (Fig. 3.47).
The system provides a management platform that supports web access and can be
adaptable to various terminals, including laptops, smartphones, and tablets. Through
Wi-Fi communication, planned cleaning areas and paths can be sent to the cleaning
vehicle. The vehicle can also be commanded to return to its initial position. The
224 3 Dynamic Surveying in Autonomous Driving

Fig. 3.46 The systematic framework of one cleaning system used in airport apron

Fig. 3.47 Cleaning vehicle on the airport apron

platform supports multiple users who can be assigned different permissions from
the administrator, effectively reducing the management cost of automated cleaning
vehicles (Fig. 3.48).
By assigning and scheduling remotely, cleaning vehicles can better accomplish the
cleaning task, significantly improving operational efficiency and safety (Fig. 3.49).
3.6 Summary 225

Fig. 3.48 Task management of the cleaning vehicles

Fig. 3.49 Path planning for a cleaning vehicle and the field after cleaning

3.6 Summary

Dynamic positioning, localization, and mapping for autonomous driving are essential
components of dynamic surveying, featuring its unique characteristics. This chapter
first outlines the navigation and positioning, object detection, and HD mapping used
in autonomous driving. Second, it elaborates on the principle and implementation of
each technology and the advantages and disadvantages of different implementations.
Finally, applications of these technologies are demonstrated in two typical scenarios:
autonomous driving applications in open-pit mines and airport aprons.
226 3 Dynamic Surveying in Autonomous Driving

References

1. Budiyono A (2012) Principles of GNSS, inertial, and multi-sensor integrated navigation


systems. Ind Robot 39(3).
2. Deng Y, Shan Y, Gong Z, et al (2018) Large-scale navigation method for autonomous mobile
robot based on fusion of GPS and LiDAR SLAM//Proceedings of chinese automation congress
(CAC), Xi’an.
3. Besl P J, McKay N D (1992) A method for registration of 3-D shapes. IEEE Trans Pattern Anal
Mach Intell, 14, 239–256.
4. Yuwen X, Chen L, Yan F, et al (2022) Improved vehic-le LiDAR calibration with trajectory-
based hand-eye method. IEEE T Intell Transp 23(1):215–224.
5. Chen L, Yang J, He Y, et al (2015) Robust optimization with credibility factor for graph-based
SLAM//Proceedings of IEEE international conference on robotics and biomimetics (ROBIO),
Zhuhai.
6. Wan G, Yang X, Cai R, et al (2018) Robust and precise vehicle localization based on multi-
sensor fusion in diverse city scenes//Proceedings of IEEE international conference on robotics
and automation (ICRA), Brisbane.
7. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision
benchmark suite//Proceedings of IEEE conference on computer vision and pattern recognition,
Providence, RL.
8. Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object
detection and semantic segmentation//Proceedings of the IEEE conference on computer vision
and pattern recognition, Silver Spring, MD.
9. Girshick R (2015) Fast R-CNN//Proceedings of 2015 IEEE international conference on
computer vision (ICCV), Santiago.
10. Ren S, He K, Girshick R, et al (2015) Faster R-CNN: Towards real-time object detection with
region proposal networks. Adv Neural Inf Process 28: 91–95.
11. Caesar H, Bankiti V, Lang AH, et al (2020) Nuscenes: A multimodal dataset for autonomous
driving//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
Silver Spring, MD.
12. Sun P, Kretzschmar H, Dotiwalla X, et al (2020) Scalability in perception for autonomous
driving: Waymo open dataset//Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, Silver Spring, MD.
13. Huang X, Cheng X, Geng Q, et al (2018) The apolloscape dataset for autonomous driving//
Proceedings of the IEEE conference on computer vision and pattern recognition workshops,
Silver Spring, MD.
14. Redmon J, Divvala S, Girshick R, et al (2016) You only look once: Unified, real-time object
detection//Proceedings of the IEEE conference on computer vision and pattern recognition,
Silver Spring, MD.
15. Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger//Proceedings of the IEEE
conference on computer vision and pattern recognition, Silver Spring, MD.
16. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint
arXiv:180402767.
17. Liu W, Anguelov D, Erhan D, et al (2016) SSD: Single shot multibox detector// Proceedings
of European conference on computer vision, Tel Aviv.
18. Limaye A, Mathew M, Nagori S, et al (2020) SS3D: Single shot 3D object detector. arXiv
preprint arXiv:200414674.
19. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints//Proceedings of the
European conference on computer vision (ECCV), Munich.
20. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:190407850.
21. Liu Z, Wu Z, Tóth R (2020) Smoke: Single-stage monocular 3D object detection via
keypoint estimation//Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition Workshops, Seattle, WA.
References 227

22. Brazil G, Liu X (2019) M3D-RPN: Monocular 3D region proposal network for object detection/
/Proceedings of the IEEE/CVF international conference on computer vision, Long Beach, CA.
23. Brazil G, Pons-Moll G, Liu X, et al (2020) Kinematic 3D object detection in monocular video/
/Proceedings of European conference on computer vision, Virtual.
24. Li P, Chen X, Shen S (2019) Stereo R-CNN based 3D object detection for autonomous driving/
/Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long
Beach, CA.
25. Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3D object detec-
tion//Proceedings of the IEEE conference on computer vision and pattern recognition, Silver
Spring, MD.
26. Lang AH, Vora S, Caesar H, et al (2019) Pointpillars: Fast encoders for object detection
from point clouds//Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, Long Beach, CA.
27. Wang G, Wu J, Tian B, et al (2021) CenterNet3D: An anchor free object detector for point
cloud. IEEE T Intell Transp. 23(8):12953–12965.
28. Qi C R, Su H, Mo K, et al (2017) PointNet: Deep learning on point sets for 3D classification and
segmentation//Proceedings of the IEEE conference on computer vision and pattern recognition,
Honolulu, HI.
29. Shi S, Wang X, Li H (2019) PointRCNN: 3D object proposal generation and detection
from point cloud//Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, Long Beach, CA.
30. Qi C R, Litany O, He K, et al (2019) Deep hough voting for 3D object detection in point clouds/
/Proceedings of the IEEE/CVF international conference on computer vision, Long Beach, CA.
31. Song S, Lichtenberg S P, Xiao J (2015) Sun RGB-D: A RGB-D scene understanding benchmark
suite//Proceedings of the IEEE conference on computer vision and pattern recognition, Boston,
MA.
32. Dai A, Chang A X, Savva M et al (2017) Scannet: Richly-annotated 3D reconstructions of
indoor scenes//Proceedings of the IEEE conference on computer vision and pattern recognition,
Honolulu, HI.
33. Chen X, Ma H, Wan J, et al (2017) Multi-view 3D object detection network for autonomous
driving//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition,
Honolulu, HI.
34. Ku J, Mozifian M, Lee J, et al (2018) Joint 3D proposal generation and object detection from
view aggregation//Proceedings of IEEE/RSJ international conference on intelligent robots and
systems (IROS), Madrid.
35. Chen L, He Y, Chen J, et al (2016) Transforming a 3-D LiDAR point cloud into a 2-D dense
depth map through a parameter self-adaptive framework. IEEE T Intell Transp 18(1):165–176.
Chapter 4
Indoor and Underground Space
Measurement

4.1 Overview

Humans are good at exploiting the indoor and underground spaces to shelter them-
selves. Apart from open lands, indoor and underground spaces are also important
for human accommodation. Persistent economic and social development continu-
ously depletes land resources, compelling the exploitation of indoor and underground
space, which involves the surveying and mapping of 3D space. On the one hand,
people spend approximately 80% of their time in indoor space, and its 3D struc-
ture is also essential knowledge. Large-scale urban complexes and transportation
hubs are becoming more extensive, and their spatial layouts are becoming increas-
ingly complex. Indoor location-based service applications for various moving objects
require navigation and maps. On the other hand, a large amount of infrastructure is in
the underground space, underground traffic, and the underground pipeline network
have been intensively built, and surveying work for these infrastructures requires
high-precision measurement.
The greatest challenge with indoor and underground space measurement is that
GNSS signals cannot be received, which makes it difficult to directly apply advanced
measurement technologies that rely on the GNSS. Additionally, it has certain special-
ties in indoor and underground space measurement. First, the space is limited, and
the structure is complex. For example, the size of the underground drainage pipe is
limited, which restricts the size of the measurement instrument terminal. Meanwhile,
the measurement is restricted due to signal occlusion. Second, dynamic changes in the
indoor environment often occur, and the latest updates are needed, which puts forward
a higher request for the timeliness of measurement. Third, large-scale infrastructure,
such as large public places and subway systems, and traditional static measurement
methods using total stations, levels, etc., cannot achieve high production efficiency.
All of these characteristics have brought huge challenges to indoor and underground
space measurement.
Positioning is the basis of dynamic surveying. This chapter first explains indoor
and underground positioning methods, including positioning based on intelligent

© Science Press 2023, corrected publication 2024 229


Q. Li, Dynamic and Precise Engineering Surveying,
https://doi.org/10.1007/978-981-99-5942-6_4
230 4 Indoor and Underground Space Measurement

terminals and precision inertial navigation. Intelligent terminal positioning mainly


provides location-based service applications in indoor spaces, while precision iner-
tial navigation positioning is mainly used for infrastructure inspection and moni-
toring. Then, several specific applications are introduced in indoor dynamic mapping
and indoor and underground space infrastructure inspection, including floor flat-
ness measurement of the National Speed Skating Oval for the Olympic Winter
Games Beijing 2022, defect detection of drainage pipelines, and internal deformation
measurement of rockfill dams. Finally, challenges and development trends of indoor
and underground space measurement technology are summarized.

4.2 Indoor and Underground Space Positioning

4.2.1 Positioning Based on Smart Terminals

While GPS is commonly used for outdoor localization, indoor localization remains
challenging due to unavailable GPS signals. People spend most of their time indoors,
making indoor pedestrian localization a crucial strategy for location-based services
(LBS).
Numerous indoor localization techniques, such as Wi-Fi, radio-frequency identi-
fication (RFID), Bluetooth, and UWB, rely on wireless radio facilities. These local-
ization techniques fall into two categories: triangulation and fingerprinting. Trian-
gulation relies on expensive technology that must be installed, making it neither
scalable nor universal. The fingerprinting process involves time-consuming training
beforehand.
In addition to radio-based techniques for interior localization, dead-reckoning
(DR) systems that rely on inertial sensors are also used. These techniques calcu-
late the current position by adding the estimated displacement to the prior estimated
displacement. Independence from external infrastructure is the primary benefit of
the DR approach. DR techniques are commonly utilized for pedestrian localization,
also known as pedestrian dead reckoning (PDR). It utilizes lightweight and inex-
pensive inertial sensors for portable devices, such as accelerometers, gyroscopes,
and magnetometers. The PDR devices consist of wearable IMUs, tablet PCs, and
smartphones. PDR is based on integrating inertial sensor measurements over time;
hence, its primary shortcoming is that even slight errors in inertial sensors will be
amplified by integration.
Several solutions have been developed to prevent the accumulation of PDR errors.
Activity-based map matching (AMM) is a technique that employs activity-related
locations as virtual landmarks to prevent the accumulation of errors [1]. For instance,
when a user takes an elevator, there would be an overweight/weightlessness moment
followed by another weightlessness/overweight moment. The accelerometer can
detect these two moments, and the elevator’s location might serve as the virtual land-
mark. Smartphones with built-in MEMS inertial sensors can be considered primary
4.2 Indoor and Underground Space Positioning 231

motion capture sensors, and human activity detection algorithms based on smart-
phones have been proposed [2], making AMM a promising method for pedestrian
indoor localization.
The AMM consists of two fundamental modules: activity detection (AD) and
map matching (MM). The purpose of the AD module is to detect a user’s current
activity, such as utilizing an elevator, turning a corner, or ascending stairs. Based on
the detected activity, the MM module determines the special point on the map where
the user is passing and then matches the estimated position of PDR to the location
of the identified unique point. Both modules may be error-prone. AD may fail to
identify an activity when it occurs, or confuse two distinct events and mistakenly
recognize one when the other has occurred. Regarding the MM, the actual location
of the user in a large indoor space cannot be identified by the observed activity
since there may be multiple special points with the same activity characteristics. The
constraint imposed by the topology of the indoor map is an overlooked aspect of the
present AMM strategies. For instance, a user cannot pass through a wall or other
obstacles depicted on the map.
We propose a smartphone-based indoor pedestrian localization method based on
activity sequences. When a pedestrian approaches a building’s specific points, a
series of events are triggered in sequence. Using the hidden Markov model (HMM),
this method achieves pedestrian localization by matching the activity sequence to
specific sites. The proposed method can achieve autonomous localization based on
PDR without the beginning point being known.
1. Activity sequence

When a pedestrian reaches particular locations of a building, such as a corner, an


elevator, an escalator, or a stair, where the pedestrian’s activities vary from walking,
the activity sequence refers to a series of actions that occur in succession. Using
activity detection techniques based on the signals of a smartphone’s embedded
sensors, these various activities can be detected.
Here, we focus on organized environments, such as office buildings, where several
particular points and pedestrians engaged in a wide variety of activities. Different
activities have different signal patterns on the readings of the smartphone’s sensors.
For example, an elevator imposes an overweight/weightlessness pattern and another
subsequent weightlessness/ overweight pattern. We consider nine different activi-
ties: down elevator, up elevator, down escalator, up escalator, downstairs, upstairs,
walking, still, and turning. The signal patterns (readings of the accelerometer and
barometer) of these activities are shown in Fig. 4.1, and we can see that different
activities have different signal patterns.
We use a deep learning-based method for activity detection. We collected training
examples of different activities by different participants. The collected data include
a 3-axis accelerometer, 3-axis gyroscope, 3-axis magnetometer, and barometer. The
collected data are time-varying. For activity detection, we use a 2 s sliding window
to segment the time-varying data. The length of the sliding window is selected by
the pre- experiment based on the balance between the activity detection accuracy
232 4 Indoor and Underground Space Measurement

Fig. 4.1 Signal patterns of different activities


4.2 Indoor and Underground Space Positioning 233

and running time. The activity detection accuracy would be insufficient if the sliding
window is too short; moreover, the running time would be too long if the sliding
window is too long. We designed a convolutional neural network for activity recog-
nition. It consists of three layers: a convolutional layer, a pooling layer, and a fully
connected layer. We obtained the hyperparameters of the CNN based on the trained
data and used them for activity recognition.
An activity sequence consists of several activities. The pedestrian’s smartphone is
capable of detecting these activities. A sample activity sequence is shown in Fig. 4.2.
A Butterworth filter is utilized to minimize the influence of noise caused by the jitter
of the human body. The activity sequence shown in Fig. 4.2 consists of seven turns,
traveling down the stairs, taking the elevator up, and making a U-turn. The position
of the pedestrian can be calculated based on the detected actions by matching these
activities to the appropriate special points.

2. Activity sequence-based localization

To eliminate the accumulation of PDR errors, the suggested method uses map
matching to determine the most likely sequence of special locations based on the
activity sequence detected. The HMM is employed as the map-matching algorithm.
Next, we will discuss PDR and define the Indoor Road Network.

Fig. 4.2 An example of an activity sequence


234 4 Indoor and Underground Space Measurement

1) Pedestrian dead reckoning


PDR is a pedestrian localization method that calculates the current position by adding
the estimated displacement to the prior location. The displacement is derived from
the step count and heading information. If the previous position is (x, y), the following
location is computed:

x + sl · sc · cos h, y + sl · sc · sin h (4.1)

where sl represents the length of the step, sc is the number of steps, and h is the
heading. The step count is determined by the algorithm for peak detection. Before
detecting a peak, the raw acceleration data must be preprocessed to exclude irrelevant
information. A Butterworth low-pass filter of order 4 with a cutoff frequency of
10 Hz is utilized for filtering. The outcome of step detection is shown in Fig. 4.3.
The heading is measured by the smartphone’s compass. The step size is set to a
default value with a random error appended.
2) Indoor road network

A node refers to each unique location where a pedestrian would engage in activities
other than walking. An indoor road network comprises all nodes. The nodes in an
office building consist primarily of corners, elevators, escalators, and staircases.

Fig. 4.3 Step detection result


4.2 Indoor and Underground Space Positioning 235

Fig. 4.4 A node example in the indoor road network

The node property is described as follows:


(1) coordinate, coordinate of the node.
(2) neighbor nodes.
(3) accessible direction (AD).
(4) accessible distance of corresponding accessible direction (ADCAD).
(5) node type (NT).
Figure 4.4 is an example of the node; the attribute of node 2 is: (x2 , y2 ); {1, 3};
{E, S, W, N}; {dE , dS , dW , dN }; Corner.

3) Hidden Markov model

HMM is utilized to match the activity sequence to the unique points in the indoor map,
i.e., the Indoor Road Network node (abbreviated as node). This section introduces
HMM for activity sequence-based localization. Each of the HMM’s discrete states
is associated with a probability distribution. A collection of transition probabilities
determines the states’ transitions. The corresponding probability distribution can
produce a result or observation in a certain state. The state cannot be directly observed
by an outside observer.
HMM is presented as follows:
(1) Hidden states. The HMM’s hidden states are nodes of Indoor Road Networks.
One node is described as one specific location where pedestrians could engage in
activities other than walking. Coordinate and type, such as corner and elevator,
are included in the node attribute.
(2) Observations. The HMM has two observations. The initial measure is the
distance traveled between two successive activity instances. The second
component is the output of the activity detection algorithm.
236 4 Indoor and Underground Space Measurement

Fig. 4.5 An example of an indoor road network and corresponding transition probabilities

(3) Transition probabilities. When activity is detected, a transition between hidden


states is communicated. The indoor road network structure is applied to calculate
the transition matrix. As a pedestrian can only move between adjacent nodes,
and each state represents a node, it is assumed that the transition probability is
uniform over all neighbors of a particular node. Figure 4.5 shows an example
of transition probability estimation.
(4) Emission probabilities. The emission probability characterizes the distribution
of the observation probability at each hidden state. Due to the two observa-
tions in the HMM, namely, location and activity type, the emission probability
consists of two components: the probability of emission based on position and
the probability of emission based on activity type. As these two observations
are independent, the probability of emission can be expressed as follows:

p(z t , m t |rt ) = p(z t |ri ) p(m t |ri ) (4.2)

where p(z t |ri ) is the position emission probability, which describes the probability
distribution of position observation in a specific hidden state. p(m t |ri ) is the activity
detection emission probability, which describes the probability distribution of an
activity type given a specific hidden state.
Position error is caused by distance estimation error and angle estimation error,
according to the PDR principle. Consequently, p(z t |ri ) consists of two components:
the probability distribution for distance observations and the probability distribution
for angle observations. Here, it is assumed that these two probability distributions
are Gaussian distributions. Since distance measurement and angle measurement are
independent, the probability distribution for an observation is defined as follows:

p(z t |ri ) = p(dt |di ) p(φt |φi )


[ ] [ ]
1 1 1 1
=√ exp − 2 (dt − di ) √ exp − 2 (φt − φi ) (4.3)
2πσd 2σd 2π σφ 2σφ
4.2 Indoor and Underground Space Positioning 237

Fig. 4.6 Schematic diagram


for position emission
probability estimation

where, σd is the standard deviation of the measured distance, and σφ is the standard
deviation of the measured angle. Based on the distance calculation method of PDR,
the distance is in direct proportion to the step length; therefore, σd is equal to the
standard deviation of the step length. dt is the distance between the observation and
the last matched (determined) state. di is the distance between the ith state and the
last matched (determined) state. φt is the intersection angle between dt and di , as
shown in Fig. 4.6.
p(m t |ri ) describes the probability of correct activity detection for a given hidden
state, which is also known as the activity detection confusion matrix.
(5) Initial state distribution. When the first activity is observed, the initial state
distribution is uniform across all candidate nodes based on the activity type. If
the start point is unknown, the candidate nodes consist of nodes of the same
type in the environment; otherwise, the candidate nodes are selected from the
start point’s nearby nodes.
(6) Viterbi algorithm. The Viterbi algorithm is utilized to find the most likely
sequence of hidden states Q = [q1 q2 . . . qT ] for the given observation sequence
O = [O1 O2 . . . OT ], and a Viterbi variable is defined as follows:
[ ( )]
δt+1 ( j ) = max δt (i )ai j b j (Ot+1 ), 1 ≤ t ≤ T (4.4)

where δt ( j ) is the highest probability along a single path at state t, ai j is the state tran-
sition probability of moving from i to j, and b j (Ot+1 ) is the observation probability
at state j. To determine the most likely state, ϕt+1 ( j ) is defined as:
( )
ϕt+1 ( j ) = arg max δt (i )ai j , 1 ≤ t ≤ T (4.5)

4) Localization scheme

Given the detected activity sequence, our approach seeks to identify all nodes at
which the user completes the activities in the activity sequence. The nodes are named
“NodeChain” corresponding to the detected activity sequence. During the process of
the proposed HMM algorithm, if the number of states is insufficient, the NodeChain
with the highest probability is not always the correct one. Therefore, we calculate
the likelihood of each NodeChain candidate using the following equation adapted
238 4 Indoor and Underground Space Measurement

from δt+1 ( j ) = [max(δt (i)ai j )]b j (Ot+1 ), 1 ≤ t ≤ T :

pt+1 ( j ) = pt (i)ai j b j (Ot+1 ), 1 ≤ t ≤ T (4.6)

where pt (i ) is the probability of a NodeChain candidate at state t. We adopt the


following criteria to select the correct NodeChain from the candidates:

Phighest
=C (4.7)
Psecondhighest

where Phighest is the highest probability of the NodeChain candidate, Psecondhighest is


the second highest probability, and C is a constant which is set to 4 herein.
After determining the correct NodeChain using Eq. (4.7), the user’s position
is determined by comparing the estimated location of each action in the activity
sequence to the identified NodeChain. Using the determined node as a starting point,
PDR can calculate the subsequent location. The smartphone sensors’ bias can be
deduced from the preceding localization procedure. The user’s step length can be
approximated based on the number of detected steps and the distance between the
nodes of the identified NodeChain.
To estimate the user’s location during the walking process (online localization),
Eq. (4.8) is used.


N
Pest = ( pi qi ) (4.8)
i=1

where Pest is the position estimated by the proposed scheme, pi is the position
estimated by every NodeChain candidate, qi is the probability of each NodeChain
candidate, and N is the number of the NodeChain candidates.

3. Evaluation

1) Activity detection performance proof of concept

A pilot study is carried out to evaluate the activity detection technique. All
data, including accelerometer, gyroscope, magnetometer, and barometer data, were
obtained using a Galaxy III smartphone running Android 4.1.1. The sampling
frequency during data collection was 100 Hz. It was requested that four partici-
pants (two females and two males) complete five activities. Each participant held
the smartphone in front of their body with one hand. Twenty traces comprised the
sample size for each activity. For Turn and U-Turn, participants walked approxi-
mately ten steps before turning (U-Turn) and walked an additional ten steps. For
the elevator, stairs, and escalator, data collection commenced and concluded at two
distinct endpoints. We collected elevator data for different floors (the 1st to the 14th)
because elevators are halted by other building occupants.
4.2 Indoor and Underground Space Positioning 239

The following equation is used to calculate the accuracy of activity detection:

Ti
Accuracy = 100% (4.9)
Ni

where Ti is the number of activities that were correctly detected of the ith type
activities, and Ni is the number of all the ith type activities.
The activity detection result is shown in Table 4.1. Using a smartphone and the
proposed activity detection method, the outcome demonstrates that activities can be
detected reliably.
2) Activity sequence-based localization

(1) Experimental setup. To evaluate the overall system performance, we conducted


trials in two buildings: an office building with a 52.5 m × 52.5 m floor plan and a
shopping mall with an 80 m × 60 m floor plan, as shown in Fig. 4.7. The proposed
system was built on the Android platform utilizing the Galaxy S III smartphone,
which is equipped with an accelerometer, gyroscope, magnetometer, and barom-
eter. The participants were instructed to walk along six representative routes at
a constant pace while holding a smartphone. Four participants repeated each
route ten times each (two males and two females). Routes #1, #2, #3, and #4
are located in the office building; Routes #5 and #6 are located in the shopping
center.

➀ Route #1 begins at an arbitrary location in the corridor, crosses many corners,


and ends at a single office seat. In this instance, the beginning point is unknown.
The conventional PDR methodology cannot be utilized in this case.
➁ Route #2 begins on a staircase and passes an open space surrounding the elevator.
The open area along Route 2 is characterized by a lack of limitations.
➂ Route #3 begins in an elevator and includes a U-turn. It is used to verify the
influence of activity unconnected to a specific location.
➃ Route #4 begins by taking an elevator to the office floor, walking to the office,
sitting for a while, and then walking to the wash basin.
➄ Route #5 and Route #6 are both lengthy routes in the shopping center, with Route
#5 beginning at an elevator and Route #6 beginning at an escalator.
Along the paths, various markers with known coordinates were placed to obtain
ground truth data. Between two markers (distance of approximately 10 m), the ground
truth is interpolated using step count.

Table 4.1 Confusion matrix of activity detection


Activity type Turn U-Turn Elevator Stair Escalator
Turn 100% 0 0 0 0
U-Turn 0 100% 0 0 0
Elevator 0 0 100% 0 0
Stair 0 0 0 100% 0
Escalator 0 0 0 0 100%
240 4 Indoor and Underground Space Measurement

Fig. 4.7 Experimental environments

By computing the Euclidean distance between the estimated position and the
actual position, the online localization error can be determined. The error is estimated
for offline localization using Eq. (4.10),

N |
∑ |
| pe,i , pg,i |
i=1
Error = (4.10)
N
where N is the number of ground truths,
| pe,i is| the ith estimated position, pg,i is the
position of the ith ground truth, and | pe,i , pg,i | is the Euclidean distance of pe,i and
pg,i .
The standard deviation of step length estimation σd is set to 0.1. The standard
deviation of the measured heading σϕ is set to 10°.
(2) Online localization performance. The online localization results for all routes
are displayed in Fig. 4.9 for the proposed approach, the proposed approach with
a known initial point, and the proposed approach with a known initial activity.
Without early activity, the average initial error rate is large. Because the initial
location is unknown, a uniform distribution is assumed for the initial position.
As the number of steps increases, the localization error gradually decreases.
After passing a number of steps, the NodeChain comprises the passed nodes
determined by the suggested approach (except Route #3), and the location is also
decided as the number of encountered activities increases. For Route #3, if the
initial activity is unknown, the trace cannot be identified due to the insufficient
number of activities; as shown in Fig. 4.8, Route #3 has only three turns.
As shown in Fig. 4.8, if the starting activity is known, the localization error
decreases more quickly (there is no initial activity on Route #1). For Route #2, if
the initial activity is unknown, the localization error decreases after approximately
40 steps; if the initial activity is known, the error drops after approximately ten
4.2 Indoor and Underground Space Positioning 241

Fig. 4.8 Online localization error results for each route

steps. This is because the initial activity is unique: in Route #2, the initial activity is
walking the stairs; in Routes #3, #4, and #5, the initial activity is taking the elevator;
and in Route #6, the initial activity is taking the escalator. These three activity-related
nodes are significantly fewer in number than the turn. Based on Fig. 4.8f, when the
initial activity is observed, the position is immediately established. This is because
242 4 Indoor and Underground Space Measurement

Fig. 4.9 Distance traveled before converging to a unique activity chain

the shopping mall has only one upward escalator. Utilizing unique activity-related
nodes would aid in accelerating convergence. The convergence speed increases with
the number of activity-related nodes that are reduced. If the initial point is known
based on the outcome of the activity detection, the accumulative error can be avoided
by matching the estimated position of the PDR to the corresponding activity point.
The routes contain several exceptional instances. There is an open section on
Route #2 where turns cannot be detected. On Route #3, there is a U-turn activity
that is unconnected to its position. There is a period of sitting idle on Route #4. The
results in Fig. 4.8 demonstrate that these cases were resolved.
(3) Convergence. The convergence speed is related to the distance traveled prior
to convergence on a distinct activity chain. The slower the convergence speed,
the larger the distance traveled. Figure 4.9 shows the distance traveled by the
different routes until the algorithm converges with and without starting activity.
Generally, with starting activity, the distance traveled is significantly less than
without starting activity. Figure 4.9 demonstrates that utilizing activity detection
information would accelerate the convergence speed to the true position. Route
#1 does not include initial activity, Route #3 cannot converge, and Route #6 can
immediately converge when the initial activity is detected.

(4) Performance vs. Activity detection accuracy. Route #2 is used as an example


for analyzing the impact of activity detection accuracy and inertial sensor error
on the activity sequence matching result. There are seven activities, including
walking upward and six turns on Route #2. We assume that stair walking would
not be accurately identified. In fact, without the barometer (some smartphones
lack an integrated barometer), it would be difficult to discern stair climbing from
ordinary walking. Figure 4.10 shows the activity sequence matching result as a
function of activity (upstairs walking) detection accuracy with various inertial
4.2 Indoor and Underground Space Positioning 243

sensor errors, expressed as the standard deviation of step length and heading.
After the completion of four actions, the matching accuracy is computed.
Figure 4.10a shows that when the step length estimate error is small, the activity
detection accuracy has little effect on the matching accuracy. When σd = 0.1, the
matching accuracy is close to 100, and the detection accuracy is 0. With increases in
step length estimation error, the impact of activity detection accuracy on the matching
outcome grows dramatically. The same pattern is shown in Fig. 4.10b, which indicates
the influence of heading error. If the sensor error is small, only turning activity can
effectively match the target point. If the sensor error is considerable, the matching
accuracy is low without stair-climbing activity. Figure 4.10 shows that activities with
a high degree of uniqueness are advantageous to activity sequence matching.
Based on Fig. 4.10, the proposed method is tolerant of inertial sensor and activity
detection error to a certain degree. If the activity detection accuracy is 100%, the
matching accuracy is greater than 60% when the standard deviation of the step
length estimation varies from 0.1 to 0.5, as shown in Fig. 4.10a. Although σd = 0.1,
the matching accuracy is close to 100% due to activity detection inaccuracy, even if
the activity detection accuracy is 0. As shown in Fig. 4.10b, the influence of heading
error and activity detection accuracy on matching accuracy is comparable to that of
step length error.
(5) Offline localization (tracking) performance. The offline localization result is
acquired by matching the position of the activity to the NodeChain. If the initial

Fig. 4.10 Matching accuracy as a function of activity detection accuracy


244 4 Indoor and Underground Space Measurement

position is unknown (Route #1), the trace prior to the first activity is recon-
structed using the position of the first activity. The tracking trajectory is depicted
in Fig. 4.11. The proposed method accurately tracked the path of pedestrians
in experimental situations. The results of the studies are reported in Table 4.2
(tracking error is the mean of 10 trials), and the mean location error of the offline
localization is approximately 1.3 m.

Fig. 4.11 Offline localization results

Table 4.2 Evaluation results


Route no. Route length/m Activity no. Tracking error
Detected Undetected Location-unrelated
1 124.50 6 0 0 0.932
2 106.70 6 2 0 1.123
3 73.25 5 0 1 1.012
4 84.18 13 0 1 1.235
5 161.40 7 0 0 1.897
6 104.50 6 0 0 1.581
© [2022] IEEE. Reprinted, with permission, from (2015). Activity sequence-based indoor pedestrian
localization using smartphones. IEEE Transactions on Human–Machine Systems, 45(5): 562–574
4.2 Indoor and Underground Space Positioning 245

4.2.2 Positioning Based on a Precision INS

Inertial navigation is a self-contained positioning technique that measures the angle


and acceleration of the platform in inertial space to estimate its attitude, speed,
and position. An INS is a navigation device that continuously estimates the pose
of a moving object without the need for any external references, so it is suitable for
indoor and underground space measurement applications. However, it generally takes
a long time for measurement, and the most sophisticated civil inertial navigation is
also faced with the issue that the measurement accuracy cannot meet the requirement
because of accumulation errors. To eliminate the accumulation error of inertial navi-
gation, the multi-sensor technique is generally used to correct the navigation error by
introducing other types of sensors or prior information. Since the INS can calculate
all the navigation states (attitude, speed, and position), it is generally used as the core
of the integrated system to construct the fusion framework based on the Kalman filter
algorithm, and other positioning sources can be supplementary. Therefore, the posi-
tion and orientation system using inertial navigation can also be called an aided INS
position system. This book introduces two common aided INS positioning methods:
odometer-assisted inertial positioning and LiDAR-assisted inertial positioning.
1. Aided INS positioning method

The Kalman filter framework requires defining the state vector, state transition equa-
tion, and measurement equation of the system. Generally, the state vector consists of
the navigation state, inertial sensor error parameters, and aided sensor model parame-
ters. The state transition equation is constructed with the inertial measurement value,
and the measurement equation is constructed with the measurement of the aided
sensor. Here is a brief introduction to the construction of the system state and state
transition equations of the Kalman filter for the aided inertial positioning system.

1) The system state vector

The system state vector describes the state of the measurement system at a certain
moment, which needs to be solved for the positioning solution. Generally, the esti-
mated system state includes the attitude, velocity, and position of the carrier and the
aided sensor-related parameters. However, since the change in the navigation state
is nonlinear, it cannot be directly estimated in the linear Kalman filter algorithm.
The error of the navigation state is generally used as the state vector; therefore, the
system state vector usually has the following form:

x = [ x nav x imu x sensor ] (4.11)


[ ]T
where x nav = φ δvn δr n is the navigation state error, and its components are
the attitude, velocity, and position error of the system. x imu = [ bg ba sg sa ]T is the
uncompensated residuals of the model parameters in the IMU, and ba , bg represent
the bias errors of the gyroscope and accelerometer, respectively. sa , sg are the scale
246 4 Indoor and Underground Space Measurement

factor errors of the gyroscope and the accelerometer. x sensor is the variable of the aided
inertial navigation sensor. For example, the level-arm value of GNSS, the installation
error of the odometer and the scale factor, etc.

2) The state transition equation

The state transition equation describes the changes in the system state over time and
is usually represented by a differential equation. Its continuity form can be expressed
as:

x(t) = Fx(t) + G(t)w(t) (4.12)

The discrete form is:

x k = F k,k−1 x k−1 + G k−1 wk (4.13)

where F is the state transition matrix, G is the design matrix of the system noise, and
w is the system noise vector, which is determined by the performance of the inertial
system.
⎡ ⎤
F nav F nav,imu 0
F=⎣ 0 F imu 0 ⎦ (4.14)
0 0 F sensor

Since inertial navigation has the characteristics of high dynamics and short-term
high-accuracy measurement, the error equations for INS are derived by using a
perturbation approach in INS equations.
(1) Inertial navigation state error equation. The system adopts a linear Kalman filter,
and the error of the system is used as the state at this time. The error equations
of position, velocity, and attitude are:

⎨ φ = −C b δωbi,b − ωni,n φ + δωni,n
n

δv̇ = C b δ f b + C nb f b φ − (ωni,e + ωni,n )δvn − (δωni,e + δωni,n )vn


n n
(4.15)
⎩ n
δr = δvn

(2) Inertial sensor model. The measurement values of inertial sensors can be simply
modeled with bias, scale factor, nonorthogonal coefficient, and white noise error.

ω̂b = K g ωb + bg + ε g
b (4.16)
f̂ = K a f b + ba + ε a

where ba and bg are the bias values, ε a and ε g are the measurement noise of the
gyroscope and the accelerometer.
4.2 Indoor and Underground Space Positioning 247
⎧ ⎡ ⎤
sa,x γa,x y γa,x z
K a = ⎣ γa,yx sa,y γa,yz ⎦
⎨ γa,zx γa,zy sa,z
⎡ ⎤ (4.17)
sg,x γg,x y γg,x z
K g = ⎣ γg,yx sg,y γg,yz ⎦

γg,zx γg,zy sg,z

where Sa,x , Sa,y , Sa,z , Sg,x , Sg,y , Sg,z are the scale factors, and γa,x y , γa,x z , γa,yx , γa,yz ,
γa,zx , γa,zy , γg,x y , γg,yx , γg,yz , γg,zx , γg,zy are the nonorthogonal coefficient between
the sensor axes. To simplify the analysis, the nonorthogonal coefficients are not
considered in the inertial sensor modeling.
The device errors of the INS, such as the bias and scale factor errors, are generally
changing slowly and can be modeled as a first-order Gauss-Markov process, namely:

ḃg = − tb,g
1
bg + εb,g
⎨ ḃ = − 1 b + ε
a tb,a a b,a
(4.18)
ṡg = − ts,g sg + εs,g
1

⎩ ṡa = − 1 sa + εs,a
ts,a

where t is the correlation time and ε is the random noise. For medium- and high-end
INS, the correlation time is relatively long, and the drift noise is small, so the bias
can be considered a random constant in a short time.

2. The odometer-aided INS positioning

In dynamic surveying, a wheeled mobile platform is usually used to improve the accu-
racy of positioning and attitude determination. The odometer is used to accurately
measure the speed of the carrier, thereby correcting the drift of inertial navigation.
Additionally, the measurement track of the carrier can be constrained for contact
measurement with the track; that is, the carrier moves following the designed track,
and the properties of the track itself can be used to help improve the positioning
accuracy, as shown in Fig. 4.12.

Fig. 4.12 The odometer-aided INS


248 4 Indoor and Underground Space Measurement

Fig. 4.13 Data processing workflow of IMU/Odometer/Control point aided positioning

Taking the Kalman filter as the data fusing framework, the relative positioning
accuracy is improved by using velocity measurement models such as odometer
velocity measurement and nonholonomic constraints. The control point measure-
ment model is used to correct the inertial navigation positioning error and improve
the absolute positioning accuracy. Finally, all data are smoothed and optimized by
the Rauch-Tung-Striebel (RTS) smoothing algorithm to obtain the optimal estimated
position and attitude. Its data fusion workflow chart is shown in Fig. 4.13.

1) Odometer support measurement model

As a low-cost and stable speed sensor, odometers are widely utilized in the positioning
of land mobile platforms. The odometer speed and inertial navigation data are fused
by a commonly used speed measurement model. Since the reference system in which
the speed is measured by the odometer (vwheel ) is different from that of the inertial
measurements (vimu ), it must be transformed into a unified reference system before
data fusion. To better take advantage of the Gaussian distribution of the odometer
measurement error, the measurement equation is constructed in the reference system.
According to the φ angle error model in inertial navigation, the calculated velocity
in the body frame can be expressed as:
b
v̂vwheel = C vb Ĉ n v̂nimu + C vb (ω̂bnb ×)l bwheel
≈ vvwheel + C vb C bn vnimu − C vb C bn (vnimu ×) − C vb (l bwheel ×)δωbib (4.19)

The measurements in the vehicle frame can be described as:

ṽvwheel = vvwheel + εv (4.20)

The measurement equation can be as follows:

z v = v̂vwheel − ṽvwheel (4.21)


4.2 Indoor and Underground Space Positioning 249

In actual measurement, it measures the number of discrete pulses by the wheel


rotation and often contains sampling noise of several pulses. To calculate the exact
speed at tk , the smoothed speed of the odometer wheel is the average speed during time
tk−n and tk+n . When fusing INS and odometer, the measurement error parameter of
speed is critical. According to the error source of the odometer, its speed measurement
accuracy can be calculated using the following equation.
cwheel
εv = (4.22)
ts Ntick

where cwheel is the wheel circumference, Ntick is the scale for one revolution of the
encoder, and ts is the sampling time interval of the encoder. For example, if an
encoder with 1000 scales per revolution is used, the wheel circumference is 0.2 m,
the sampling frequency is 5 ms, and the mileage coding noise is set to 1 scale, then
the speed measurement accuracy of the encoder is 4 cm/s. The sampling scale of the
odometer is finer, and the measured speed is more accurate.
2) Nonholonomic constraint measurement model
A nonholonomic constraint refers to the phenomenon that when the carrier is moving,
only the forward direction velocity in the carrier coordinates system is not zero. The
vehicle side and upward direction velocities are approximately equal to zero, which
is:

ṽvnhc = [0 vvy 0] + ε nhc (4.23)

This motion feature is more common in land mobile navigation. Because the
velocity is the first derivative of the position, the constraint on the velocity can be
considered as the constraint on the local smoothness of the trajectory, which is also
an important reason why the nonholonomic constraint can improve the accuracy of
the INS. The nonholonomic constraint measurement equation can be expressed as:

z nhc = v̂vwheel − ṽvnhc (4.24)

Since the speed in the forward direction is unknown, the covariance of the forward
component of εnhc can be given very large, and only the right-side component and
the upward component can be assigned values. The value can be adjusted according
to the degree of fitness of the carrier to the nonholonomic constraint.
3. LiDAR-aided INS positioning
Dynamic surveying generally establishes a dynamic position and attitude reference
by mobile navigation and then observes the surrounding environment by non-contact
sensors (such as LiDAR and cameras) (Fig. 4.14). Due to the continuity of movement,
it is always possible for environmental sensors to make repeated observations in the
same environment if the measurement frequency is sufficient. It is a critical path to
improve the positing accuracy by making the best use of the redundant information of
250 4 Indoor and Underground Space Measurement

Fig. 4.14 Non-contact sensors for dynamic observation of the environment

Fig. 4.15 Data processing workflow of IMU/odometer/control point assisted positioning

this repeated observation. Moreover, when the accurately modeled sensor observes
the feature points in the environment, if the control measurement (Fig. 4.15) on some
clear sparse feature points is performed, the absolute accuracy of positioning can also
be effectively improved. The proposed laser-assisted inertial navigation positioning
method is introduced in this book.
1) LiDAR registration aided positioning

To solve the divergence of pure inertial positioning error during GNSS outages,
a LiDAR registration-assisted inertial positioning method is proposed. The basic
idea is to utilize the trajectory obtained by inertial recursion to stitch LiDAR point
clouds. Due to the error of the INS, misalignment of the point clouds occurs. This
misalignment can be accurately measured by the LiDAR registration algorithm, and
a model (Eq. 4.25) between the LiDAR registration error and the inertial navigation
error can be constructed by analyzing the inertial error changes in a short time. By
using this model, the inertial navigation error can be estimated, and the error can be
4.2 Indoor and Underground Space Positioning 251

adjusted by the Kalman filter. The method can eliminate the inertial navigation error
in a short time and improve the relative positioning accuracy.

pp − q p = ( p̂p − q̂ p ) − (δr p (t p ) − δr p (tq ) = ( p̂p − q̂ p ) − (∆t p − ∆tq )δv p (t0 )


1
− (∆t p2 − ∆tq2 )δg p + (C 2 (t p ) − C 2 (tq ))δba (4.25)
2
where pp , q p are the matched LiDAR point clouds, ∆t p , ∆tq are the differences
between the scanning time of the LiDAR point cloud and the previous time period,
and δg p is the gravity decomposition error introduced by the attitude angle. δba is
the bias of the accelerometer, and C 2 (t p ), C 2 (tq ) are the quadratic integral of the
rotation matrix with respect to the relative rotation angle of the inertial navigation in
a short time.
2) LiDAR control point aided positioning

LiDAR can scan control points in the surrounding environment while moving and
associate the control point information in the space with the inertial navigation carrier.
Specifically, the observation model of LiDAR control points is as follows (Eq. (4.26).
By the perturbation to the observation model, the error model of the control point
coordinates can be obtained.
n
r nctrl = r nins + C nb (C bl r lctrl + l bl,b ) = r̂ nins + δr n + (I − (φ×))Ĉ b (C bl rctrl
l
+ l bl,b )
(4.26)

where r nctrl are the coordinates of the control point and r lctrl are the LiDAR point
coordinates corresponding to the control point coordinates, that is, the control point
coordinates calculated in the point cloud. l bl,b is the lever arm value between the
LiDAR scanner and the coordinate system of the inertial navigation carrier, C bl is the
rotation matrix between the LiDAR coordinate system and the coordinate system of
the inertial navigation carrier, and l bl,b and C bl are pre-calibrated.
Furthermore, taking the coordinate difference between the observed control point
coordinates and the actual control point coordinates as the observed value, the
measurement model of the control point assisted can be obtained.
n
z LiDAR = r̂ nlas − r nctrl = δr n + Ĉ b ((C bl r lctrl + l bl,b )×)φ + εctrl ≈ δr n + εctrl (4.27)

Since the attitude angle error of the navigation system is relatively small, the error
n
Ĉ b ((C bl r lctrl + l bl,b )×)φ caused by it is generally smaller than the measurement noise
of the control point εctrl by an order of magnitude, so it can be disregarded. Notably,
when the control point is farther away from the carrier, the influence of the calibration
error and the attitude angle error of the inertial navigation itself is greater.
252 4 Indoor and Underground Space Measurement

4.3 Indoor 3D Mapping

In recent years, the development of technologies such as mobile Internet and intelli-
gent terminals has promoted applications related to location-based services. People
spend most of their time indoors, so indoor location services have become a research
hotspot. The indoor 3D map is one of the critical elements of indoor location services,
and it is a necessary condition for implementing indoor location services. In tradi-
tional indoor 3D mapping, building drawings are usually a vital data source. However,
indoor buildings often belong to different owners, and it is difficult to obtain detailed
building construction drawings due to privacy concerns. On the other hand, due to
interior decoration and furniture arrangement, the topology of the indoor 3D map
often changes, and the 3D map from the architectural design drawing cannot accu-
rately reflect the real indoor environment. Although the existing indoor 3D map
can be updated by manual drafting, this method is very time-consuming and labor
intensive, which is detrimental to promoting indoor 3D map applications.

4.3.1 Indoor Mobile 3D Mapping

Large-area and high-precision 3D mapping is usually carried out by 3D laser scan-


ning, which can be categorized into static and mobile scanning according to the
operation mode. In static scanning, the laser scanner is generally stationed statically,
and a 3D point cloud with a common coordinate system is obtained by stitching all
scanning results from multiple static stations in different positions based on corre-
spondence points. The key to static scanning is how to quickly and accurately stitch
point cloud data from different scanning stations. In the general process, stitching
is performed on the correspondences between adjacent stations, which are easily
identifiable, and measured marks such as targets or target balls. The accuracy of
this method can be guaranteed, but it results in low efficiency due to data collection
from multiple stations, manual measurement of marker points, and other complicated
tasks. In recent years, much research has been done on the stitching of static scan-
ning point clouds without an artificial target, such as direct registration using ICP or
feature point registration. However, accurate initial values and high overlap of point
clouds are required in these methods. For the ground static scanning point cloud,
since the overlap is small, it is difficult to accurately estimate until the stitching is
completed. Therefore, matching failure often occurs in practice and requires manual
correction. In general, it faces low efficiency by using static 3D scanning for 3D
mapping in a large area or poor intervisibility.
1. LiDAR-aided inertial indoor 3D mapping method
1) Basic principle
To improve the relative mapping accuracy, a 3D mapping solution based on LiDAR-
assisted inertial navigation was proposed. First, the symbol of the variable is defined,
4.3 Indoor 3D Mapping 253

and the output LiDAR scanning data stream is denoted as S from the hardware data
acquisition system. The data stream of IMU is denoted as I, the data stream of LiDAR
scanning is denoted as Si in time window [t i , t i+1 ], and the data stream of IMU is
denoted as I i , where the data stream of LiDAR scanning is composed of sequential
LiDAR scanning frames, namely, Si = [Si1 Si2 . . . Siw ], w is the number frames of
LiDAR scanning within the time window. The data stream of IMU consists of the
inertial measurement frames, that is, I i = [Ii1 Ii2 . . . Iin ] is the number of inertial
data measurement frames. The 3D trajectory of the output by the data processing
system is denoted as T = [x1 x2 . . . x K ], and the map of the 3D point cloud is denoted
as M = [M1 M2 . . . M K ].
The framework of the indoor high-precision mapping method based on laser-
assisted inertial navigation is shown in Fig. 4.16, which can be divided into three
modules. The first is the commonly used Kalman filter module in INS. The second
module is the mobile laser registration-assisted inertial navigation module, which can
estimate the error of the INS while generating a local consistency map at the same
time and input back to the first module for feedback correction. The third module is a
global map generation module, and the local map generated from the second module
is optimized to obtain a globally consistent map.
Specifically, an initial trajectory T0 is obtained by performing pure inertial navi-
gation on the initial state of the system and the inertial measurement value I i . Then,
a pre-registered point cloud map can be generated by using the initial trajectory and
the laser scan data Si during this period. Due to the existence of inertial integra-
tion error, the different scan frames in the pre-registered point cloud map are not
aligned, as shown in Fig. 4.17a. By using the mobile point cloud ICP registration
algorithm, the point cloud registration error in the pre-registration map is minimized,
the trajectory correction parameters are estimated, and the original trajectory T0 is
corrected. Furthermore, the relationship between the estimated error parameters and
the error of the INS is established, and the divergence of the inertial navigation

Fig. 4.16 Indoor 3D mapping framework based on LiDAR-aided INS


254 4 Indoor and Underground Space Measurement

Fig. 4.17 Before-and-after comparison of moving point clouds

error is corrected by the feedback from the extended Kalman filter measurement
update. The filtered 3D trajectory and the local consistent map by time segment can
be created after the point cloud is optimized by a sliding window of a certain width
and the inertial navigation is corrected, as shown in Fig. 4.17b. Since the trajectory
obtained by the Kalman filter is not smooth, and a slight divergence exists, the corre-
sponding trajectory points [x1 x2 . . . x K ] are abstracted from all the obtained sub-
graphs [M1 M2 . . . M K ]. The constraint relationship between points is constructed
by the sequence registration constraints and closed-loop registration constraints of
the sub-graphs. Finally, the precise position and attitude and the high-precision 3D
map generated by the optimized sub-graphs can be obtained after a unified graph
optimization adjustment.

2) Local sub-graph generation

The key to the accurate registration algorithm of the moving point cloud is to construct
correct matching pairs [3]. For the construction of point cloud matching pairs, there
are two main categories of methods: nearest neighbor registration and feature regis-
tration. The advantage of nearest neighbor registration is that the registration accuracy
is high, but the computation is complex, and it is sensitive to the initial value. The
advantage of the feature registration method is that it is insensitive to the initial value
and has highly efficient computation, but it has high requirements for environmental
characteristics. For short-time window point cloud registration, the overlap is very
high, and the requirements for initial values are not high. On the other hand, the point
cloud features of some indoor areas are relatively poor, and feature registration often
fails. Since the nearest neighbor registration has the characteristics of simplicity and
high precision in the case of a relatively accurate initial value, the most classic nearest
neighbor registration method is selected.
4.3 Indoor 3D Mapping 255

The key to point cloud registration based on the nearest neighbor search lies in
the selection of the initial value and overlap, which directly determines the efficiency
and robustness of the algorithm. Before registration, to ensure a small initial value
error, inertial navigation is used to recurse the short-time trajectory, and then the point
cloud of the carrier coordinate system is converted into a unified coordinate system.
In this way, the hypothesis that the nearest neighbor points are correspondences
can be satisfied to the greatest extent, and the number of iterations of parameter
estimation can be effectively reduced. When matching, the scan frames are divided
into two sets, P and Q, and then the nearest neighbor point cloud in the two sets of
point clouds is used as the conjugate point for matching. Using the cross-registration
method, as shown in Fig. 4.18, the scan frame of the sequence point cloud in the
window is divided into two sets of point clouds following the time interval. This
method can ensure that the overlap of the point clouds to be registered is large
enough in any scene and at any time. Therefore, the robustness of the matching can
be guaranteed. Since the scanned scenes of P and Q are similar, the overlap can
be set to 90 and 10% of the points with the largest matching distance are regarded
as wrong matching points and deleted. Furthermore, the matching points whose
matching distance exceeds a certain threshold (set to 2 m) are eliminated. Due to the
limitation of computing costs, the point cloud in a window with a certain width is
selected for registration and estimation for the parameters of the trajectory correction
model. Then, it is used as an observation to correct the INS state, and the local high-
precision trajectory and map sequence can be obtained at the same time. After the
measurement update of the Kalman filter is completed, the window is slid by a certain
distance. Generally, the sliding distance can be set to half of the window, and the new
model correction parameters and point cloud subgraph can be calculated. As shown
in Fig. 4.18, ti and ti+1 are the current short-time window ranges and tk and tk+n
are the corresponding point cloud window ranges. tk+n/2 is the starting point of the
next window after sliding. In turn, the sequence of the map sections and trajectory
sections of the overlapping point cloud can be obtained by Kalman filtering. The point
cloud map section is denoted as Mi , hereinafter referred to as the submap, and the
trajectory section is denoted as Ti . The point cloud and trajectory in a single window
are considered the best estimated trajectory, which is subsequently optimized as a
whole. Therefore, the trajectory and submap can be represented by its first position,
denoted as ( pi , R(φi , θi , ψi )), where (φi , θi , ψi ) are the pitch, roll, and azimuth of
the mobile platform, respectively, and pi is the position of the current submap.
3) Global point cloud map generation
A loop closure constraint can be formed if the carrier returns to the same location after
a long period of time and observes the same environment. This constraint can be used
to eliminate accumulated position errors. Due to the irregularity of the time involved,
the Kalman filter cannot model very well with the loop closure constraint. At this
time, only the global optimization method is used to perform the overall optimization
calculation on the measurements. The relative trajectory of the INS in a short time
window can be accurately estimated by inertial integration, and all measurements in
that time window can be unified to the coordinate system at the starting point by the
256 4 Indoor and Underground Space Measurement

Fig. 4.18 Point cloud matching in a short time window sequence

relative trajectory, which changes the scanned data in a period to stationary scanned
data. At this time, mobile surveying becomes an issue of stationary data stitching by
motion compensation, so that global bundle adjustment can be performed by means
of graph optimization.
The bundle adjustment optimization to be solved is represented as a graph form
in the graph optimization method, which can be denoted as {v, e}. The nodes in the
graph represent the variables to be estimated, and the sides in the graph represent
the constraints between the nodes in the graph. For positioning, the nodes only have
variable attitude, but for positioning and mapping, the nodes include the variables of
attitudes and the map. The map generated locally is used as the description informa-
tion of the trajectory points to construct the position constraint relationship between
the attitude points.
The laser registration of the short-time window sequence cannot estimate the
absolute position of the mobile platform and the rotation direction around the direc-
tion of gravity. Therefore, the result of the Kalman filter will produce the drift of
the 3D position and azimuth over time, and the error will continue to increase. For
repeated measurement scenarios, Kalman filtering cannot establish spatial constraints
over long time intervals. Using graph optimization, all point clouds can be globally
optimized according to all relative spatial constraints, and a global geometrically
consistent 3D point cloud map can be obtained.
In the graph optimization method, nodes and sides need to be concretized, which
can be expressed as:
[ ( )−1 ( ) ]
( ) R φ̂i , θ̂i , ψi p j − pi − p̂i, j
ei, j vi , v j = (4.28)
ψ j − ψi − ψ̂i, j

where p̂i, j and ψ̂i, j are observations:

p̂i, j = R̂i ( p̂ j − p̂i ) (4.29)

ψ̂i, j = ψ̂ j − ψ̂i (4.30)


4.3 Indoor 3D Mapping 257

There are two kinds of sides in the graph optimization of the subgraph generated
by Kalman filtering. The first one is the sequence side, that is, the nodes of the
side are time adjacent, and the second one is the loop closure side, that is, after a
period of time, the side is constituted by the corresponding node that the observed
scene is repeated. For adjacent sides, the subgraph results of the local point cloud
registration by the Kalman filter can be used for registration, and the spatial constraint
relationship between the position points can be calculated. For loop closure sides,
loop closure detection is required to determine whether the two subgraphs belong
to the same scene; if it belongs to the same scene, it can be registered according to
the initial relative position relationship. Finally, the set of sequence edges e(i, j) ∈ S
and the set of closed-loop edges e(i, j) ∈ L can be obtained, where i and j are vertex
nodes in graph optimization.
After the nodes and sides for graph optimization are constructed, common graph
optimization can be used to solve the problem, and the globally adjusted position
and azimuth of each subgraph can be obtained.

[ ]T ∑
NS ∑
NL
pψ = min e2Sk + e2L k (4.31)
p,ψ
k=1 k=1

where N S and N L are the number of edges in sets S and L, while e Sk and e L k
are the kth edge constraint in sets S and L.
Finally, all the submaps are stitched together to obtain the final optimized point
cloud map by the optimized position and attitude.
2. Indoor 3D mapping system
For indoor 3D mapping applications in a GNSS-denied environment, a multi-sensor
integrated 3D mobile LiDAR mapping system is developed. The hardware system
can collect high-precision time-synchronized inertial measurement data, LiDAR, and
image data. The inertial measurement and LiDAR data files or data streams are input
into the software system, and the high-precision 3D trajectory and 3D point cloud
map can be obtained by the laser-assisted inertial navigation method.
The 3D mapping system is a customized multi-sensor integrated system (Fig. 4.19)
that includes a KVH DSP-1750 fiber optic IMU and a VLP-16 Velodyne multi-line
LiDAR. Various sensors are integrated through the acquisition board and ensure
that the data collected by different sensors are in a unified time reference system.
Only the inertial navigation and LiDAR sensors in the system are introduced here.
The KVH DSP-1750 IMU consists of three fiber optic gyroscopes and three MEMS
accelerometers. The sampling frequency of the inertial navigation is 250 Hz, which
can effectively capture the high-frequency motion characteristics of the carrier plat-
form. To ensure that the scanned point cloud has enough overlap for constructing
motion constraints, the VLP-16 multi-line LiDAR is selected. VLP-16 scans the
surrounding 360° environment at a frequency of approximately 10 Hz, and the point
cloud obtained by a complete 360° scan is called a scan frame.
258 4 Indoor and Underground Space Measurement

Fig. 4.19 Indoor 3D mapping system

3. Application

1) Indoor high-precision 3D mapping


Three scenarios on the campus of Shenzhen University are taken as examples: indoor
scene, outdoor scene, and indoor-outdoor integrated scene. The experiment is carried
out using a loop closure route, and the starting and ending positions of each data
collection are at the same location. Specifically, the indoor scene is an office building,
the walking trajectory is from the 16th floor to the 14th floor to the 16th floor, and
the indoor scene has stairs and glass. The walking trajectory of the outdoor scene is
a circle around an office building. During outdoor mapping, people pass through a
lychee orchard. In the indoor-outdoor integration scene, the trajectory starts from the
ground, passes through an underground parking lot, and then returns to the starting
point, as shown in Fig. 4.20.
Before the data collection, let the mapping system remain still for 1–5 min for
IMU attitude initialization. After the collection is completed, the static alignment
algorithm is first used to initialize the INS, and then LiDAR point cloud registration
in a short time window is performed according to the method described above.
Finally, the INS is corrected by using the trajectory correction model to build a
measurement model. Finally, loop closure detection and the overall adjustment of
graph optimization are conducted, and eventually, the trajectory and 3D point cloud
map in the entire time frame are obtained. To verify the mapping accuracy, a typical
indoor parking lot scene is selected, which is approximately 100 m × 50 m, and it is
the indoor portion in the data of the indoor and outdoor integrated mapping. A high-
precision Z+F5010 scanner is used to scan the ground truth. Its ranging resolution is
0.1 mm, and the ranging accuracy at 50 m is better than 2.2 mm. Precise stitching is
performed after multiple station scanning, and the mosaicked point cloud is used as
the ground truth, as shown in Fig. 4.21.
4.3 Indoor 3D Mapping 259

Fig. 4.20 Experimental scenarios and data processing results


260 4 Indoor and Underground Space Measurement

Fig. 4.21 Absolute accuracy verification by indoor 3D LiDAR mapping

The point cloud of the same scene and the ground truth point cloud processed
by the 3D mapping method introduced above are registered by ICP to eliminate the
overall translation and rotation between the two-point clouds. A part of the distance
on the wall was manually selected for plane accuracy comparison, and a part of the
ground and ceiling feature points was selected for elevation accuracy comparison.
This comparison method can represent the accuracy of the subsequent point cloud
vectorization. The accuracy comparison is shown in Table 4.3. The results show that
for a 100 m × 50 m indoor parking lot 3D mapping application, the plane accuracy
4.3 Indoor 3D Mapping 261

can reach 3 cm (1 σ ), and the elevation accuracy is approximately 2 cm (1 σ ), which


meets the accuracy of indoor 3D mapping.
2) Indoor 3D map applications

An indoor 3D map can be generated after indoor 3D mapping data modeling, and it
can be used in applications such as indoor positioning and navigation, as shown in
Fig. 4.22.
On the basis of indoor 3D mapping, positioning and navigation services for large
venues have been developed. The system supports the functions of the visualization
of large complex venues, booths, and product brands of exhibitors and the function
of classifying and searching for exhibitor information, which significantly improves
the effect of the exhibition. In the meantime, relying on the multi-source integrated
positioning technology, service functions in the venue area have been implemented,
such as real-time location navigation, simulated navigation services, exhibition route
guidance services, and real-time location sharing services. It improves the user’s

Table 4.3 Absolute accuracy verification by 3D mapping


Verification d lins /m d Z+F /m ed /m Verification hlins /m hZ+F /m eh /m
distance point
d1 46.05 46.08 − 0.03 p1 − 1.95 − 1.92 − 0.03
d2 14.7 14.65 0.05 p2 − 1.93 − 1.92 − 0.01
d3 67.96 67.99 − 0.03 p3 − 1.95 − 1.93 − 0.02
d4 22.52 22.54 − 0.02 p4 − 1.91 − 1.95 0.04
d5 48.33 48.34 − 0.01 p5 − 1.93 − 1.93 0.00
d6 90.86 90.89 − 0.03 p6 − 1.94 − 1.95 0.01
d7 36.35 36.36 − 0.01 p7 − 1.93 − 1.94 0.01
RMSE 0.029 0.021

Fig. 4.22 Indoor 3D mapping and application


262 4 Indoor and Underground Space Measurement

Fig. 4.23 Indoor 3D map platform in a large exhibition hall

exhibition experience. An indoor 3D map platform based on a large exhibition hall


is shown in Fig. 4.23, which can support a multi-terminal display.
With the continuous development and improvement of rapid transit, the public
safety issue of subway hubs has become increasingly prominent, such as large rider-
ship, difficult supervision and management of personnel, traffic congestion, and
abnormal crowds occurring from time to time. Although there are many cameras
in the station, the efficiency of exceptional event detection and emergency response
leaves more room for improvement due to the imitation of spot patrolling and
policing. The monitoring system, ticket system, personnel management system, and
other data are independent of each other, so these systems cannot be utilized effi-
ciently. Therefore, the support and assistance of an intelligent monitoring system of
subway stations based on a unified space and time reference framework is required.
A high-precision 3D map of the indoor visualization can be produced by utilizing
the CAD as-built drawings of subway stations and on-site collected data. All buildings
and equipment are modeled in the scene, and the appearance and shape according to a
real-world scene are simulated with higher fineness and better reducibility. A digital
twin platform of the subway station is being built. In the meantime, it provides visual-
ization functions such as information displays, alarm reminders, ridership heatmaps,
etc. A better visualization effect for subway management and control is provided. The
developed intelligent monitoring system for subway stations is shown in Fig. 4.24.

4.3.2 Indoor Map Update Based on Crowdsourcing Data

A significant component of an indoor navigation system is the indoor map, which is


frequently required by the indoor localization system to locate the user. Some indoor
localization systems that use accurate indoor maps to control the drift of inertial
4.3 Indoor 3D Mapping 263

Fig. 4.24 Intelligent monitoring system for subway stations

sensors also require an indoor map [4]. Moreover, several new techniques for the
generation of autonomous radio maps require knowledge of real maps [5].
Indoor maps are typically inaccessible because they may be owned by multiple
parties who are frequently unwilling to share them with the general public. In addi-
tion, a building’s internal structures and functionalities frequently evolve with time,
rendering the original maps obsolete. Even if it is feasible to manually create an
indoor map or update an existing map, the process is extremely laborious and
time-consuming.
Crowdsourcing is a low-cost and effective method for extracting meaningful infor-
mation from crowdsourced data. Crowdsourcing has been successfully applied to
OpenStreetMap (OSM) for the compilation of outdoor maps that incorporate the GPS
for localization. However, because GPS cannot function indoors, the production of
indoor maps must rely on additional devices to collect location data.
ALIMC is an activity landmark-based indoor mapping approach using crowd-
sourcing. ALIMC is based on two important observations. The first observation is
that the indoor map can be described using a link-node model [6] in which pathways
are the links and intersections of pathways are the nodes, as depicted in Fig. 4.25. The
link-node mode is sufficient for indoor navigation since it provides a natural structure
for locating people and points of interest (POIs). In addition, people typically move
along walkways and indoor POIs are connected by pathways. The second observa-
tion is that the nodes of the link-node model are typically corners, elevators, and
staircases. Typically, when individuals traverse the nodes, they engage in activities
other than walking. The built-in sensors of a smartphone can detect these activi-
ties. Using the inertial sensors of a smartphone, PDR can also measure the relative
distance between these nodes.
264 4 Indoor and Underground Space Measurement

Fig. 4.25 Link-node model of an indoor map

The main concept of ALIMC is merging crowdsourcing trajectories using the


activity-related location as a landmark. The activity-related location landmark is
termed the “activity landmark” for simplicity. Two characteristics comprise an
activity landmark: the activity type and Wi-Fi fingerprints recorded at the activity.
ALIMC first extracts activity landmarks for indoor map construction utilizing activity
detection algorithms. ALIMC then groups all activity landmarks into distinct clus-
ters, each of which is regarded as a map node. After clustering, ALIMC calculates the
distances between each node to produce a distance matrix. The multi-dimensional
scaling (MDS) approach can be used to create an indoor map based on the distance
matrix. For the construction of indoor maps, ALIMC faces the following challenges
[7].
(1) Clustering of activity landmarks. The crowdsourcing data contain numerous
activity landmarks, some of which were collected at the same node. Typically,
an indoor setting contains multiple activity landmarks with the same activity
attribute. A structure, for instance, typically contains numerous corners. Because
of this, it is difficult to group activity landmarks based only on the activity type.
Therefore, Wi-Fi fingerprints are utilized to distinguish these landmarks. There
may be a larger spatial density of activity landmarks than the spatial resolution
provided by the Wi-Fi fingerprint in certain circumstances. The first challenge
is how to automatically group activity landmarks collected at the same node
into a single cluster.
(2) Calculation of relative distances. The relative distances between all nodes are
one of the fundamental components of ALIMC. PDR allows the calculation of
the distance between adjacent nodes. Nonetheless, if two nodes are not adjacent,
the angles between the links are necessary for calculating distance. The errors
of the angle recorded by the compass and gyroscope are extremely consider-
able, resulting in an inaccurate distance measurement. The second difficulty is
compensating for the angle estimation error when calculating distance.
(3) Crowdsourcing data. Crowdsourcing provides a potentially scalable alternative
for autonomous map construction. The primary obstacle is how to combine
crowdsourced data collected by various users.
4.3 Indoor 3D Mapping 265

To address these issues, ALIMC comprises the following elements:


(1) Activity landmark clustering algorithm. This approach is used to cluster the
activity landmarks obtained at the same node into a single cluster. The Wi-Fi
fingerprint is initially used to cluster these activity landmarks with comparable
Wi-Fi features. However, some nodes are so close together that their Wi-Fi capa-
bilities cannot be distinguished. To eliminate these unique activity landmarks,
a spatial-information-based grouping technique is utilized.
(2) Relative-distance estimation method. The intersection angle between two links
is required for relative distance estimation. For gyroscope-based angle calcu-
lation, a heuristic method is utilized to determine the accuracy of angle infor-
mation. In addition, the inner angles of a closed polygon are employed as prior
knowledge to deduce the gyroscope’s bias drift.
(3) Indoor map construction approach. For indoor map generation, the relative
distances between all activity landmarks are first determined based on the results
of distance and angle estimation. The MDS method is then utilized to generate a
relative map. Based on several reference points, the relative map is transformed
into an absolute map in the final step.
The suggested method uses activity-based context information as a landmark for
the construction of an indoor map. In addition to topology information, the ALIMC-
created indoor map also includes node attributes, such as the position of elevators
and stairs. These attributes are essential for pedestrian navigation, as elevators and
stairs are regularly utilized.

1. System overview

An overview of the ALIMC system is depicted in Fig. 4.26. ALIMC uses a smart-
phone’s built-in sensors to collect motion data and Wi-Fi fingerprints. Direction,
angular velocity, pressure, and acceleration are included in the motion data. The Wi-
Fi fingerprint consists of the MAC of the AP and the received signal strength (RSS)
value.
The most difficult aspect of ALIMC is combining crowdsourced data to create
an indoor map. The purpose of ALIMC is to extract the links and nodes from the
collected data, as the indoor map can be characterized by the link-node model.
ALIMC includes three modules. The first module, activity landmark clustering, is
used to group together activity landmarks collected at the same node. This module
accepts as input the result of activity detection, the associated Wi-Fi fingerprint, and
motion data. The output is the clustering result, and each cluster is represented by a
map node. The second module, relative distance estimation, computes the distance
between each node and generates a distance matrix. This module receives as input
the activity landmark clustering result and motion data. The output is a matrix of
distances. The third module is indoor mapping, which is utilized to create an indoor
map based on the distance matrix. This module accepts as input the distance matrix
and multiple reference points. The deliverable is the indoor map [7].
266 4 Indoor and Underground Space Measurement

Fig. 4.26 System overview

2. Activity landmark clustering


The data gathered through crowdsourcing include numerous activity landmarks,
some of which are collected at the same node. Therefore, activity landmarks collected
at the same node should be clustered together, and each cluster is treated as a map
node. This section describes how activity landmarks should be clustered.
1) Activity landmark definition
The indoor map can be described as a link-node model, in which the pathways are
the links and their intersections are the nodes. Activity landmark is the term used to
describe nodes in context form. The following six-tuple identifies a major activity
landmark.

AL {ID, t ype, F, ∆H, (Dl , Dn ), (IDl , IDn )} (4.32)

where ID represents the identifier of the activity landmark; type is the activity type,
including turning, taking the elevator, and walking up/down the stairs; F is the Wi-Fi
fingerprint collected when the activity is detected; ΔH is the heading change value
when the activity occurs, which is the angle between two connected links; Dl is the
4.3 Indoor 3D Mapping 267

distance between the current activity landmark and the last one; Dn is the distance
between the current activity landmark and the next one; and IDl and IDn are the ID
of the last and next activity landmarks, respectively.

(1) Activity detection

The first step in obtaining the activity landmarks is to detect activity. There are
typically three types of indoor activities: turning, taking the elevator, and walking up
and down the stairs.

(2) Activity landmark clustering

The generation of multiple activity landmarks, some of which are collected at the
same node, follows the detection of activity. Activity landmark clustering is used to
group activity landmarks taken at the same node into the same cluster and assign
them a unique identification number, i.e., each cluster is treated as a map node.

2) Wi-Fi fingerprint-based activity landmark clustering

Wi-Fi features are utilized to cluster the activity landmarks because Wi-Fi networks
are widely available. These works use the RSS difference to differentiate between
locations, which works well if users’ smartphones are similar. However, if the data are
collected through crowdsourcing, the smartphones of the participants are typically
very diverse. Different devices’ Wi-Fi chipsets and antennas result in different RSS
values for the same location. RSS order is the distinguishing characteristic between
distinct activity landmark clusters. Theoretically, the order of RSSs from a set of APs
measured by different smartphones in the same location is identical. To determine
the efficacy of the RSS order feature, we collected the RSSs of neighboring APs at
the same node using three distinct smartphones: Nexus S, Nexus 5, and Galaxy 3.
Figure 4.27 shows the results.

Fig. 4.27 RSSs measured at


the same location by
different smartphones
268 4 Indoor and Underground Space Measurement

Table 4.4 Correlation


Model Nexus S Nexus 5 Galaxy 3
coefficient of different
smartphones Nexus S 1 0.84 0.93
Nexus 5 0.84 1 0.92
Galaxy 3 0.93 0.92 1

As shown in Fig. 4.27, the RSS values measured by various smartphones are
different. However, the order in RSS is nearly identical. In most cases, the correla-
tion coefficient can be used to assess the similarity of the RSS order. The Pearson
correlation coefficient for these three distinct smartphones is shown in Table 4.4, and
it can be seen that it is extremely high. We use the RSS correlation coefficient as the
Wi-Fi characteristic to distinguish between various activity landmarks.
If the correlation coefficient is sufficiently high, we can conclude that the two
fingerprints were collected at the same location. Nevertheless, if the number of APs
is insufficient, the correlation coefficient may be high even if the two fingerprints
were collected from different locations. To resolve this issue, we employ the Jaccard
similarity coefficient as an additional metric for clustering activity landmarks. The
Jaccard coefficient is a statistical measure used to evaluate the similarity and diversity
of sample sets. The methods for calculating the correlation coefficient and Jaccard
coefficient are outlined in the following sections.
Without sacrificing generality, we assume that f i and f j are the Wi-Fi fingerprints
of ALi and ALj , respectively.

f i = [(maci1 , r ssi1 ) (maci2 , r ssi2 ) · · · (macim , r ssim )]


[( )( ) ( )]
f j = mac j1 , r ss j1 mac j2 , r ss j2 · · · mac jn , r ss jn

where mac is the MAC of AP, which is unique, and rss is the RSS value of the AP.
First, we determine the intersection and union of the MACs of f i and f j .

MACint = MACi ∩ MAC j


MACuni = MACi ∪ MAC j
[ ]
where MACi = [maci1 maci2 · · · macim ], MAC j mac j1 mac j2 · · · mac jn .
The Jaccard coefficient of f i and f j is calculated using the following equation:

num(MACint )
J aci j = (4.33)
num(MACuni )

where num(·) represents the number of MAC in the set.


To calculate the correlation coefficient of f i and f j , the RSSs of the same APs
must be extracted first, namely, the RSS of each AP in the MACint set of f i and f j
(MACint = [mac1 mac2 · · · mack ]).
4.3 Indoor 3D Mapping 269

f i' = [(mac1 , r ssi1 )(mac2 , r ssi2 ) . . . (mack , r ssik )]


[( )( ) ( )]
f j' = mac1 , r ss j1 mac2 , r ss j2 . . . mack , r ss jk

According to the RSSs of f i' , the order indexes i xsort can be obtained by sorting
the RSSs in descending order.

[RSSi , i xsort ] = sort([r ssi1 r ssi2 . . . r ssik ])

Then, based on i xsort , the RSSs of f j' are reordered, and RSS j is obtained. The
correlation coefficient of f i and f j is calculated using the following equation:
( )
cov RSSi , RSS j
Corri j = (4.34)
σRSSi σRSS j
( )
where cov RSSi , RSS j is the covariance of RSSi and RSS j , σRSSi and σRSS j are the
standard deviation of RSSi and RSS j .
On the basis of J aci j and Corri j , it is possible to determine if ALi and AL j are
collected at the same node. If the conditions J aci j ≥ jacth and Corri j ≥ corrth ,
ALi and AL j are met; otherwise, they are not. In our experiments, the parameters
jacth and corrth are both set to 0.6 [7].

3) Spatial Information-based activity landmark clustering

Using the Wi-Fi fingerprint-based clustering algorithm, it is possible to correctly


cluster a number of activity landmarks. However, certain activity landmarks are so
close together that the Wi-Fi feature is unable to distinguish between them. There
may be some mismatched pairs following the Wi-Fi-based clustering of activity
landmarks. One new activity landmark may be matched to multiple nodes in the
database of nodes. The spatial relationship is used to eliminate mismatched pairs of
landmarks.
When a new trajectory is updated, activity landmarks are detected using an
approach for activity detection that extracts the Wi-Fi features of the activity land-
marks. Then, we receive a sequence of activity landmarks indexed in chronolog-
ical order. [NAL1 NAL2 . . . NALm ], for instance, is an activity sequence of a newly
updated trajectory consisting of m activity landmarks (we use NAL to express the
New Activity Landmark extracted from the updated trajectory). Using the Wi-Fi-
based clustering algorithm, there are n (n < m) NALs with similar Wi-Fi features
to the nodes in the node database; thus, one activity landmark may be matched
to multiple NODEs (we use NODE to express nodes in the database). Then, we
obtain n similar NODEs sets for n NALs , [NODE1 NODE2 . . . NODEn ], where
NODEs,i = [NODE1 NODE2 . . . NODEk ], which means that for NALi , there are k
NODEs in the node database that have similar Wi-Fi features (note that for different
270 4 Indoor and Underground Space Measurement

NAI, the number of similar NODEs may be different). Since each node in the node
database is unique, there is only one matching NODE or potentially none for each
NAL.

3. Relative distance estimation

All activity landmarks are clustered into distinct activity landmark clusters, and
each cluster is assigned a unique identification number. The activity landmark clus-
ters serve as the indoor map’s nodes. To construct an indoor map, the distance
between every two nodes is an additional consideration. This section describes how
to determine the distance between each node.

1) Distance between adjacent nodes

Using the PDR method, the distance between two adjacent nodes is calculated. In
PDR, the displacement is derived from the number of steps and the direction of
movement.
If the previous location is (x, y), the next location is calculated as

(x + sl · sc · cosh, y + sl · sc · sinh) (4.35)

where sl denotes the step length, sc is the step count, and h is the heading.
Step count is obtained using the peak detection algorithm. The step detection
result is shown in Fig. 4.28. The step length is estimated using the frequency-based
model: stride len = a f + b, where f is the step frequency, and a and b are parameters
that can be trained offline. Since we assume that the indoor map can be described by
the link-node model, a pedestrian’s heading does not change when walking between
two nodes. Consequently, h is set to zero during the distance calculation between
adjacent nodes.

Fig. 4.28 Step detection


result
4.3 Indoor 3D Mapping 271

2) Intersection angle estimation


In an indoor map, the majority of nodes are not adjacent; therefore, the intersection
angle between every two links must be known to calculate the distance between these
nodes. When a pedestrian passes through an intersection, the intersection angle repre-
sents the change in direction. Using the smartphone’s internal compass, one can deter-
mine the heading value. However, in indoor environments, the compass is susceptible
to magnetic interference. In addition to the compass, the gyroscope can be used to
calculate the heading change value because it can report the smartphone’s relative
angular velocity. The gyroscope provides the relative angular displacement, which
can be used to estimate the heading change when integrated over time. However,
the integration is affected by noise, resulting in gyroscope drift. In our system, we
use the heuristic drift reduction method to estimate the heading changes. Moreover,
due to the drift error ε, the output of a gyroscope can be modeled as Eq. (4.36) for
each interval.

wi = w̃i + ϵ (4.36)

where wi represents the output of the gyroscope, w̃i represents the true value, and ϵ
represents the bias drift. The heading change is then computed using the following
equation


tend
Ψ = wi Ti (4.37)
i=tbegin

where Ψ is the heading change value, Ti is the time interval, and tbegin and tend are the
beginning and ending times of the turn activity, which are derived from the turning
time obtained by the turn detection algorithm. tbegin = t − twin and tend = t + twin ,
where t is the turning time and twin is the time window, which in our experiment was
set to 0.4 s.
The bias drift of a gyroscope can be calculated using the sum of the inner angles
of a closed polygon, which is equal to 180(N − 2) (N is the number of edges). For
instance, a quadrilateral consists of four nodes, and the sum of the inner angles is
360° (ground truth). By integrating the gyroscope data, the sum of the inner angles
can be estimated (with the bias included). Then, we can infer the bias using the
ground truth. The inferred bias can then be utilized.
Another method for calculating the angle of intersection between two lines is to
approximate it as 90°. The estimation is based on the observation that the majority
of office building corridors are orthogonal.
The relative distance between all nodes can be calculated using the distances
between adjacent nodes and the intersection angles. If more than one path exists
between two nodes, the average distance of these paths is used to calculate the
relative distance. If sufficient trajectories are uploaded, it is possible to determine
the distance between all nodes [7].
272 4 Indoor and Underground Space Measurement

4. Indoor mapping

Using the relative distance estimation method described in the preceding section,
we can determine the relative distance matrix of all the nodes. On the basis of the
relative distance matrix, an indoor map can be created. This section explains how
to create an indoor map. Two steps comprise the indoor map construction method:
relative indoor map construction and absolute indoor map construction.

1) Relative indoor map construction

ALIMC generates the relative indoor map using the MDS technique based on the
relative distance matrix. MDS is frequently used to determine spatial relationships
between objects based on information about dissimilarity. The information regarding
dissimilarity is the relative distance between nodes. Using the relative distance matrix
as the input, MDS can generate the relative indoor map, from which we can determine
the relative spatial relationship between all nodes. The relative indoor map can be
used for indoor navigation because it depicts the relative spatial relationship between
nodes based on the local coordinate system.
2) Converting to the absolute indoor map

Commonly, Procrustes analysis is used to determine the scaling, translation, and


rotation to fit a configuration as closely as possible to another. Let Ym be a vector of
estimated positions of m reference points (located by some other indoor localization
methods), and assume that real positions of n nonlinearity nodes are known, where n
is larger than three, and X n is a vector of the positions. Based on X n , a subset vector
Yn can be constructed from Ym composed of n nodes included in X n . With X n and Yn ,
a normal Procrustes analysis can proceed. As a result, scaling Sn , rotation Rn , and
translation Tn parameters for mapping Yn to X n can be obtained. Then, Ym can be
transformed to the absolute indoor map (Z m ) with the parameters by the following
equation:

Z m = Sn Ym Rn + Tn (4.38)

Based on Eq. (4.38), the absolute indoor map can be constructed. The reference
points can be obtained by other methods, e.g., the last reported GPS location.
5. Experiment

1) Experimental setup

To evaluate the performance of the proposed system, ALIMC was implemented on


Nexus S, Nexus 5, and Galaxy 3 Android smartphones. As shown in Fig. 4.29,
the experiments were conducted on two floors of an office building with a 52.5 m
× 52.5 m floor plan. During the experiment, participants walked normally in the
accessible areas of the building while holding a smartphone in front of them. For the
purpose of simulating crowdsourcing users, participants began in varying positions.
4.3 Indoor 3D Mapping 273

Fig. 4.29 The experimental environments

For the purpose of evaluating ALIMC’s performance with incremental data, each
trace is repeated ten times. Three participants collected 300 user trajectories using
three distinct smartphone models. These trajectories correspond to 220 min of data
collection in terms of time. The collected data include acceleration data, compass
data, gyroscope data, barometer data, and Wi-Fi fingerprints.
2) Visual results
Before presenting the quantitative evaluation results, the visual result of ALIMC is
presented. Figure 4.30 shows the result of the mapping procedure. A portion of the
trajectories inferred by PDR based on the crowdsourcing data is shown in Fig. 4.30a.
ALIMC can generate an indoor map of the environment based on crowdsourcing
data. First, the activities contained in the trajectories must be identified. The results
of activity detection are shown in Fig. 4.30a. Activity landmarks are generated by
associating detected activities with the Wi-Fi fingerprints. Then, ALIMC groups all
activity landmarks into distinct clusters, and each cluster functions as a node. After
clustering, the relative distance between all nodes can be calculated. Using the MDS
technique, which is depicted in Fig. 4.30b, the relative map can be generated using
the distance information. Finally, the relative map is converted to an absolute map.
The results of the mapping are depicted in Fig. 4.30c (2D) and Fig. 4.30d (3D).
Figure 4.31 shows the collected user trajectories and the inferred indoor maps
(the 14th floor) for varying numbers of collected user trajectories. As anticipated,
the quality of the map improves as the amount of data increases.
3) Metric
The following metrics were used to assess the quality of the inferred pathway map.
Graph discrepancy metric (GDM): This metric reflects the differences between
the nodes of the constructed map and those of the actual map. Euclidean distance is
used as the difference metric.
274 4 Indoor and Underground Space Measurement

Fig. 4.30 Outcome of the mapping process

Shape discrepancy metric (SDM): This metric measures the differences between
the shapes of the inferred and real paths. To calculate the SDM, the link segments
between nodes were uniformly sampled to obtain a series of sample points. The
metric is defined as the distance between corresponding sampling points.

4) Performance

(1) Performance of different angle estimation methods. Four methods are used to
estimate the intersection angle between two connected links: the first method
uses readings from the smartphone’s compass; the second uses the heuristic
method; the third method uses the interior angles of the closed polygon to
compensate for the bias drift of the gyroscope; and the fourth method is based
on the observation that all intersections of the office building are right-angle, so
we assume that the intersection angle between the two connected links is also
right-angle.
4.3 Indoor 3D Mapping 275

Fig. 4.31 User trajectories and the inferred indoor maps under different numbers of user trajectories

Figure 4.32a, b depict the cumulative distribution function (CDF) of the GDM and
SDM of the constructed map using all of the collected data; the angle is computed
using four angle estimation methods. It can be seen that the compass method has a
significantly larger error than the other two methods. Using the heuristic gyroscope
method significantly reduces map construction errors. For GDM, the maximum error
is approximately 4 m, whereas the 80-percentile error is roughly 2 m; for SDM,
the maximum error is around 4 m, whereas the 80-percentile error is about 1.8 m.
By compensating for the interior angles of the closed polygon, the error in map
construction can be reduced. The maximum GDM and SDM errors are approximately
2.5 m and 3.5 m, respectively. The 80-percentile error for the GDM is approximately
1.6 m, and for the SDM, it is 1.3 m. If we assume that all intersections are right
angles, the system for map construction can achieve extremely high accuracy. The
GDM and SDM have an error margin of approximately 1.6 m.
(2) Performance with incremental data. In terms of time, Figs. 4.33 and 4.34 depict
the GDM and SDM with incremental crowdsourcing data, respectively. We
can see that as the amount of crowdsourced data increases, the quality of
the constructed indoor map improves. Compared to the compass method, the
changing trend of the gyroscope (with compensation) and right-angle method is
more evident. This is because compass error is independent of location, making
crowdsourcing difficult to compensate for. The error introduced by the gyro-
scope and right-angle methods can be compensated for through crowdsourcing.
As shown in Figs. 4.33 and 4.34, the maximum error of the GDM and SDM
decreases dramatically as the amount of data increases from 15 to 75 min. For
276 4 Indoor and Underground Space Measurement

the gyroscope method, the maximum error of the GDM and SDM is reduced
from approximately 8 m to about 3 m; for the right-angle method, it is reduced
from about 4.5 m to around 2 m [7].

Fig. 4.32 Map construction performance of the angle estimation method

Fig. 4.33 CDF of GDM with incremental data

Fig. 4.34 CDF of SDM with incremental data.


© [2022] IEEE. Reprinted, with permission, from (2015). ALIMC: activity landmark-based indoor
mapping via crowdsourcing. IEEE Transactions on Intelligent Transportation Systems, 16(5): 2774–
2785
4.4 Flatness Detection of Super-Large Concrete Floor 277

4.4 Flatness Detection of Super-Large Concrete Floor

Floors are often used in large and medium-sized parking lots, commercial plazas,
station wharves, sports fields, logistics warehouses, hospitals, research and develop-
ment sites, etc. Concrete floors are the most common type of floor used. Floor flatness
is a parameter that describes the degree of surface fluctuation in a certain area and
is an important indicator for evaluating the quality of the floor. The flatness of the
concrete floor not only affects its appearance but also affects its functionality. For
example, the high requirements on the flatness of the floor for the stable movement of
large forklifts in warehouses and high-level ice sports competitions have been raised.
In the meantime, the area of indoor cement floors is increasing with the construction
of large warehouses and large sports venues. For example, the area of a single piece
of concrete floor in the newly built National Speed Skating Oval for the Olympic
Winter Games Beijing 2022 has reached several thousand square meters, and a height
difference of the ground flatness is required within ± 3 mm in any area of 5 m. The
flatness needs to be evaluated during the construction, completion, operation, and
maintenance of the floor. To ensure the quality of the project, the construction quality
can be evaluated by the flatness measurement periodically during the construction
period before the concrete is completely set, and it can guide the construction of
the floor. During the completion, operation, and maintenance period, the measure-
ment of flatness is mainly used to evaluate whether the floor can meet the functional
requirements. Therefore, quick, accurate, and precise flatness measurements are the
foundation for achieving high-quality floor construction maintenance.
There are several methods for inspecting the flatness of a floor. In accordance with
the relationship between the measuring instrument and the floor, these methods can
be divided into two distinct categories: the contact type and the non-contact type. The
contact type indicates that the instrument makes direct contact with the floor to collect
relevant data, such as the lean ruler method, the leveling method, and the profiler
method [8]. The gap between the lean ruler and the floor is measured with a feeler
gauge using the lean ruler method. Due to its low efficiency, it is only appropriate for
sparse point sampling inspection. In addition, there are no guidelines for positioning
the lean ruler on the surface, resulting in poor repeatability in practice. The leveling
method measures the relative height of the floor using a leveling gauge. It is simple
to conduct, but the degree of automation is low, making it unsuitable for extremely
large floors. The American Society for Testing and Materials (ASTM) introduced
the industry standard ASTM E1155 for floor shape measurement and proposed a
quantitative measurement method based on the profiler to improve the repeatability
of the flatness measurement method [8]. The profiler is a measuring instrument with
two feet at its base. It follows the predetermined route. The instrument’s horizontal
angle is measured by an inclinometer. Since the distance between the feet is known,
it is possible to calculate the height between the feet. In accordance with the mean
and standard deviation of the height difference, the F-number standard algorithm is
then used to calculate the corresponding index. It benefits from high repeatability
and is highly automated [8]. However, only the overall flatness of the floor can be
278 4 Indoor and Underground Space Measurement

evaluated; the local flatness of any given location cannot be determined. Kangs et al.
analyzed and evaluated the flatness of uncured concrete surfaces by combining a
concrete vibrating trowel with various sensors, such as displacement, speed, and
angle sensors [9]. However, this method employs a mechanical boom support and
vibrating trowel and sensor drives. The measurement area is limited by the length
of the boom; therefore, it cannot be used to measure the flatness of a large floor.
Consequently, the current contact-based methods suffer from sparse measurement
points and low efficiency. In addition, the lean ruler method, the leveling method, and
the profiler method are unable to determine the inspection data’s location. Excessive
flatness, which is essential for the construction and maintenance of large-scale floors,
is not advantageous for subsequent repairs.
Non-contact flatness inspection methods include photogrammetry [10], 3D laser
scanning [11], and other technologies. Photogrammetry uses a camera to capture
the target from multiple perspectives and calculates the 3D coordinates of the target
with high accuracy. The vertical and horizontal accuracies can reach up to 0.12 mm
and 0.13 mm, respectively [10]. However, this method requires the placement of
ground-based cooperation markers. This step is time-consuming and will have an
effect on the construction of a super-large floor. LiDAR is a common instrument for
intensive and precise scanning of 3D shapes. Existing methods exist for evaluating
the flatness quality of cast-in-place ground and prefabricated components. It can be
integrated with a building information model (BIM) with a large data volume and high
precision. However, the point cloud becomes sparser as the distance increases. The
measurement range is limited because the density and accuracy of the point cloud
at a great distance cannot meet the requirements for flatness quality inspection.
Due to the low scan density and high incident angle of laser beams, the effective
measurement distance of LiDAR is approximately 20 m from the scanner. It is more
appropriate for smaller spaces and rooms. In addition, the non-contact method is
easily obstructed. Therefore, it is necessary to repeatedly change the position of the
instrument, which introduces station transfer errors in the flatness measurement. On
the whole, the existing methods have numerous flaws and cannot accommodate the
rapid and precise measurement of the flatness of super-large floors [12].
In short, the existing methods cannot meet the rapid and accurate inspection
requirements for super-large floors simultaneously. Since floor flatness data are
important for guiding floor construction and evaluating floor quality, it is neces-
sary to develop fast and precise measurement technology for super-large floors.
For the issues in the current ground flatness measurement, we propose a flatness
measurement method based on aided-INS.
The measurement instrument is moved along the planned route, the inertial navi-
gation error is corrected by using supplementary sensors such as the total station
and odometer, and the 3D motion trajectory of the measurement instrument can be
obtained to calculate the flatness of the floor. For the requirements of flatness inspec-
tion for concrete floors during construction, a flat platform measuring instrument and
related data processing approach was developed [13]. According to the acceptance
requirements of concrete flatness during the completion period, a wheeled measuring
instrument and related data processing methods were also developed. We performed
4.4 Flatness Detection of Super-Large Concrete Floor 279

performance tests on different measuring methods and successfully applied this tech-
nology to the flatness measurement of the concrete base under the indoor ice surface
during construction and the accepted measurement during completion in the National
Speed Skating Oval.

4.4.1 A Rapid Method of Aided-INS Floor Flatness Detection

1. Measurement principle

The essence of floor flatness detection is to measure the relative height difference
in a certain area. Inspired by the principle of measuring flatness by a lean ruler, we
propose to construct a virtual baseline with measuring the ground undulation within
a certain length range and finally calculate the flatness. The measurement principle
is shown in the Fig. 4.35.
For any point Pi on the elevation profile of the floor, a local window of length
L is taken from the floor profile with this point as the midpoint, and the difference
between the maximum elevation Hmax , the minimum elevation Hmin , and the mean
elevation Hmean is calculated. The maximum bump Hup and the maximum depression
Hdown are obtained, which means the maximum fluctuation of the elevation of the
trajectory point in the local window relative to the average elevation in the window.
The largest absolute value of Hup and Hdown is used as the flatness index Fi at the
center point Pi , and the equation for calculating Fi is:
| |
| |
Fi =
Hmax − Hmean , | Hup | ≥ |Hdown | (4.39)
Hmin − Hmean , | Hup | < |Hdown |

Therefore, based on this consideration, the key to the flatness detection method
is to accurately obtain the undulating profile shape of the floor. We propose a rapid
measurement method for floor flatness based on the odometer-aided-INS positioning
technology described in Sect. 4.2.2 of this book. The technical workflow is shown in

Fig. 4.35 Flatness measurement principle


280 4 Indoor and Underground Space Measurement

Fig. 4.36 Flatness measurement based on aided-INS positioning

Fig. 4.36 and is divided into three parts: data collection, trajectory estimation, and
flatness calculation. First, in the data collection stage, INS can precisely sense the
ground fluctuation with customized equipment for flatness measurement. Then, INS
data with multi-source data such as total station and odometer are fused in trajectory
calculation using the Kalman filter and RTS smoother, and the motion trajectory
(including attitude, speed, and position) of the detection equipment is solved. Finally,
according to the speed and attitude, the milage-elevation curve, flatness index, and
horizontal coordinates of the points exceeding the limits of flatness are calculated.
2. Measurement method of flatness during construction

1) Flatbed plate measuring equipment during construction

During the construction period, the floor is in the initial setting stage and still has
a certain fluidity. Therefore, the pressure of the INS carrier on the ground should
not be too large to avoid obvious deformation of the setting cement floor. In the
meantime, long-term inertial navigation measurement should suppress inertial navi-
gation errors, and wheels cannot be used on cement ground that has not yet been
solidified. Therefore, a total station with an automatic tracking function is used to
track and measure the surveying robot. The measuring device is equipped with a
prism matching the total station, which records its own 3D coordinates with the total
station.
We designed a flatbed inertial measurement vehicle based on the above discus-
sion. The flatbed is used as the component in contact with the ground to reduce
the pressure and is rigidly connected with the prism to meet the requirement of the
measurement method. The structure of the flatbed inertial measurement vehicle is
shown in Fig. 4.37. Its main components include the backplane, main control circuit
module, inertial measurement unit, battery, and prism. The core part of the vehicle is
the IMU, which is used to collect acceleration and angular velocity data while moving
and send the data to the main control circuit module. The main control circuit module
controls the data acquisition, data storage, and export of the inertial measurement
module. The carrier plate includes a base plate and warped surface to reduce pressure
and friction. The prism installed on the steel pipe reflects the laser emitted by the
total station and records its real-time 3D coordinates with the total station.
4.4 Flatness Detection of Super-Large Concrete Floor 281

Fig. 4.37 Flatbed vehicle structure of inertial measuring

2) Data acquisition and solution


The data collection method in the field is shown in Fig. 4.38. When using the flatbed
inertial measuring vehicle for flatness detection, drag the vehicle on the cement
floor during construction, use the total station to track the prism in the meantime,
and measure the real-time 3D coordinates of the device. The inertial measurement
module is used to record its own motion information, such as acceleration and angular
velocity, to accurately measure the shape of the ground. Therefore, the flatness status
information in each position on the floor can be obtained according to the attitude
and horizontal position information of the IMU.
The data fusion method of the total station and INS is shown in Fig. 4.39. The
coordinates of each trajectory point r nINS on the measurement route are calculated by

Fig. 4.38 Data collection method by flatbed INS measuring vehicle


282 4 Indoor and Underground Space Measurement

Fig. 4.39 Data fusion method based on the Kalman filter

performing inertial recursion on the angular velocity and acceleration obtained by the
IMU. The 3D coordinates r nTS of the carrier observed by the total station are used to
correct the inertial recursion with the Kalman filter as the framework. The position,
attitude speed, and filtering intermediate information of each trajectory point on the
measurement route are obtained by performing smoothing on all data by the RTS
smoothing algorithm.
After acquiring the motion trajectory, the mileage-elevation curve of the measure-
ment route is calculated. Let the mileage of the starting point of the measurement
route be 0. The mileage dn and elevation h n of each point along the measurement
route to the starting point can be recursively deduced according to the velocity vn of
the trajectory point and pitch angle θn in the attitude and the sampling time ∆t of
the inertial measurement module. The calculation equation is:
[ ] [ ] [ ]
d d 1
= + vn ∆t (4.40)
h n+1 h n sin θn

The data sampling frequency of the inertial measurement module is as high as


200 Hz, and the amount of data is very large. Therefore, it is necessary to resample
the trajectory points of the measurement route. The equidistant resampling method
is used in this book, and the resampling interval is set to 0.1 m, which can greatly
reduce redundant points and speed up the data computation [12].
For any evaluation point Pi , if the absolute value of the flatness index Fi at point
Pi is greater than the limit ε, then the plane coordinates of the point are extracted
and drawn on the same plane, and the areas where the flatness in the floor does not
meet the requirements can be intuitively known, so the flatness quality of the entire
site assessment can be completed, and the repair will be taken. Here, the window
length L and the limit ε are assigned according to the floor materials and usage or
the requirements of the constructor.

3. Measurement method of flatness during completion

1) Flatness measurement equipment during the completion phase

At the completion stage, the concrete floor is completely solidified, and people can
walk freely on its surface. The inertial measurement errors can be corrected by
4.4 Flatness Detection of Super-Large Concrete Floor 283

Fig. 4.40 Structure diagram of a wheeled inertial navigation measuring vehicle

an odometer and total station at the same time. Therefore, a flatness measurement
system for superlarge floors during the completion phase is proposed, which inte-
grates IMU, odometer, and total station data. For the measurement requirements
and actual conditions for the superlarge floor after completion, a wheeled inertial
navigation measurement vehicle is designed, as shown in Fig. 4.40. Specifically, the
vehicle structure includes a carrier device, a type-1 roller, two type-2 rollers, a T-
type pushrod, and an odometer fixed on the type-2 roller; a data measurement device
includes a main control module, the IMU and the odometer connected by the control
module. In this device, the main control module and the IMU are fixed on the carrier
device, and the odometer is fixed on the type-2 roller. The surface morphology data
of the floor are collected by pushing the carrier device on the finished concrete floor.
2) Data acquisition and processing
The data acquisition method of the wheeled inertial navigation measurement vehicle
is similar to that of the flatbed vehicle. When using the wheeled vehicle for flatness
inspection, the inspector will turn on the system and let it stand for 5 min, then
push the whole device to move on the concrete floor. The inspector will also use
the total station to track the prism in real time to measure the 3D coordinates of
the device. The inertial navigation measurement module is used to record its own
attitude information, such as acceleration and angular velocity, to accurately measure
the shape of the ground. Therefore, the flatness status information of each position
on the floor can be obtained according to the attitude information and plane position
information of the IMU.
The trajectory calculation method is similar to that described in the flatbed vehicle
measurement method. The difference is that an odometer is additionally added for
284 4 Indoor and Underground Space Measurement

Fig. 4.41 Flatness measuring based on the INS, odometer and total station

motion constraint. Therefore, the optimal position and attitude are estimated by
adding the position coordinates of the control point observed by the total station and
the speed measured by the odometer to the Kalman filter as auxiliary data to constrain
the inertial navigation recursion. The process is shown in Fig. 4.41.
After the coordinates of each point are calculated, the flatness is evaluated. First,
resampling is performed; note that it is different from the construction phase. After
resampling, the original elevation of the sampling point must be filtered and smoothed
by the median filter, so the outliers caused by running over stones and so on can be
removed. Before calculating the flatness index, it is also necessary to calculate the
trend of each point by the mean filter. The relative elevation can be calculated by
subtracting the trend from the original elevation of each point. The length of the
mean filter window depends on the number of sampling points included in the length
of the relative elevation base window.

4.4.2 Testing and Application

The real shot of the flatbed measurement equipment and pushrod measurement are
shown in Fig. 4.42. To test the function and suitability of the device, the key indicators
and performance of the two devices are tested, and corresponding experiments are
designed to test their usability and accuracy.

1. Test of the flatness measurement of the flatbed device

A 4 m × 50 m rectangular test site was selected on the ice-surface concrete base


of the National Speed Skating Oval to conduct the accuracy verification experiment
on the flatbed flatness measuring device. The plane route of the test is shown in
Fig. 4.43. Three north–south measurement routes are designed on the west, central
and east sides with a length of about 45 m, and each route is measured three times.
4.4 Flatness Detection of Super-Large Concrete Floor 285

Fig. 4.42 Measuring device for flatness measurement

Fig. 4.43 Road map of the accuracy test experiment

The coordinates and elevation at the starting point are set to zero to establish
an east-north-up coordinate system. Calculate the mileage-elevation curves corre-
sponding to the three routes on the west, middle, and east sides, as shown in Fig. 4.44.
After removing the bends at both ends of the curve, the length of the curve in the
figure is 30 m, and the number of sampling points is 300. It can be seen from the
figure that the relative elevation measured multiple times on the same route has good
repeatability.
Performing statistical analysis on the relative elevation data of the above three
measurement routes. For each measurement route, calculate the correlation coef-
ficient between the three measurement results and the elevation difference at the
same position. The correlation coefficient, the percentage of the difference within
the interval [− 1 mm, 1 mm], and the maximum value of the difference are listed,
as shown in Table 4.5. It can be seen from this table that the correlation coefficients
of the different measurements are all above 0.89 for routes with the same length of
30 m, and the absolute value of the relative elevation difference between any two
results is at least 92% within 1 mm. The maximum difference at the same position
between different measurements is not more than 2 mm, which indicates that the
286 4 Indoor and Underground Space Measurement

Fig. 4.44 Three relative elevation measurement results for the west, central, and east side routes

measurement value of the relative elevation has high repeatability, and the relative
accuracy can reach 2 mm/30 m or 1/15,000. The accuracy of this method is verified.
We use levels to verify the accuracy of the measurement results. Set 10 control
points CTL1–CTL10 along the route of the moving vehicle, as shown in (b), to
form a closed leveling measurement route, starting from CTL1 to measure CTL2,
CTL3 in turn and close to CTL1 after CTL10. The Trimble DiNi03 level and its
leveling rod were used to measure the height difference between two adjacent points
on the leveling route. In the meantime, the height difference between them can be
measured by the inertial measurement, and this height difference is compared with
the height difference measured by the level. The results are shown in Table 4.6. It
4.4 Flatness Detection of Super-Large Concrete Floor 287

Table 4.5 Relative elevation repeat surveying statistic


Measurement sector Measuring sequence number Correlation coefficient ρ εmax /mm
The west side 1 and 2 0.9265 1.964
1 and 3 0.9629 1.211
2 and 3 0.9134 1.490
The central 1 and 2 0.9517 0.775
1 and 3 0.9706 1.064
2 and 3 0.9572 1.018
The east side 1 and 2 0.8950 1.591
1 and 3 0.9595 0.766
2 and 3 0.9378 1.103

Table 4.6 Elevation difference results comparison


Backsight Foresight Leveling line Elevation Elevation Comparison
point point distance/m difference by difference by of elevation
level/mm ins/mm difference/mm
CTL1 CTL2 22.002 − 1.43 − 2.27 − 0.84
CTL2 CTL3 22.796 2.00 2.75 0.75
CTL3 CTL4 22.332 − 2.34 − 2.36 − 0.02
CTL4 CTL5 22.639 7.29 7.79 0.50
CTL5 CTL6 22.58 − 6.13 − 7.01 − 0.88
CTL6 CTL7 25.874 1.01 0.11 − 0.90
CTL7 CTL8 22.833 − 0.85 − 0.40 0.45
CTL8 CTL9 22.37 − 2.67 − 3.62 − 0.95
CTL9 CTL10 21.77 6.31 6.94 0.63
CTL10 CTL1 22.191 − 2.86 − 2.01 0.85

can be seen from the table that the total length of the leveling route is 227.387 m, and
the closure error is 0.33 mm. The difference in the control points measured by the
leveling instrument and the method in this book is within ± 1 mm, which shows that
the total station/INS method proposed in this book has high outer precision, which
can meet the requirements of rapid measurement of flatness for super large indoor
floors during the construction phase.

2. Test of wheeled flatness measurement device

The accuracy of the flatness measuring robot mainly includes the repeatability and
accuracy of the relative elevation. A total of 15 waypoints are set at the site, and the
layout is a grid with three rows and five columns. The five points in each row are
separated from each other by an interval of 3 m, and the three points in each column
are separated from each other by an interval of 2 m, as shown in Fig. 4.45.
288 4 Indoor and Underground Space Measurement

Fig. 4.45 Repeatability of relative elevation surveying

1) The repeatability of the relative elevation measurement


First, the plane coordinates and height difference of the waypoints are measured.
A total station is used to measure the plane position of each waypoint. Then, the
leveling route is designed, and a leveling instrument and leveling rod are used to
measure the height difference and relative elevation between the waypoints as the true
value. Finally, the flatness measurement robot is used to take multiple measurements
along the same route in the same field, as shown in Fig. 4.45a. When measuring,
the robot traverses all waypoints on the edge in clockwise and counterclockwise
directions. Two sets of data were measured, and the measurement of each set of data
was taken three times along a circle clockwise or counterclockwise, for a total of
6 times. The repeatability statistics on the deviation of each group of measurement
curves in the vertical direction were performed. Finally, the repeatability statistics
on the mileage-elevation curves of the six sets of measurement data are performed,
and the measurement results of the elevation repeatability accuracy of the flatness
measurement robot are obtained, as shown in Fig. 4.45b. The results show that the
total length of the measurement route is approximately 193 m, and the average
elevation repeatability of multiple measurements is 1.9 mm [12].
2) The verification of relative elevation measurement accuracy
The flatness measuring vehicle is used to conduct multiple measurements on the same
location, as shown in Fig. 4.46. The measurement path of the robot must traverse all
the waypoints, and each measurement route consists of two mutually perpendicular
zigzag lines that form a grid. A total of 2 sets of data were measured, and each
set of data contained a grid of measurement routes with a total of 2 circles. The
RMSEs of the calculated differences between the two groups of data are 1.2 mm
and 0.8 mm, respectively. The lengths of the two sets of leveling routes are 101 m
and 74 m, respectively, and the relative RMSE is calculated to be 1/84,167 and
1/92,500, approximately 1/85,000. Therefore, the method using the flatness measure-
ment robot proposed in this book is capable of achieving a high level of outer precision
(Table 4.7).
4.4 Flatness Detection of Super-Large Concrete Floor 289

Fig. 4.46 The moving path of the relative elevation accuracy test

Table 4.7 The comparison analysis of two relative elevation measuring results
Point name Elevation by level/m Elevation by the measuring Difference/mm
vehicle/m
First group Second group First group Second group
CTL01 0.0000 0.0000 0.0000 0.0 0.0
CTL02 − 0.0162 − 0.0169 − 0.0165 − 0.7 − 0.3
CTL03 − 0.0204 − 0.0216 − 0.0198 − 1.2 0.6
CTL04 − 0.0182 − 0.0174 − 0.0179 0.8 0.3
CTL05 − 0.0049 − 0.0058 − 0.0067 − 0.9 − 1.8
CTL06 − 0.0240 − 0.0231 − 0.0234 0.9 0.6
CTL07 − 0.0346 − 0.0330 − 0.0329 1.6 1.7
CTL08 − 0.0363 − 0.0380 − 0.0357 − 1.7 0.6
CTL09 − 0.0360 − 0.0379 − 0.0358 − 1.9 0.2
CTL10 − 0.0266 − 0.0281 − 0.0259 − 1.5 0.7
CTL11 − 0.0388 − 0.0400 − 0.0379 − 1.2 0.9
CTL12 − 0.0444 − 0.0464 − 0.0428 − 2.0 1.6
CTL13 − 0.0485 − 0.0499 − 0.0477 − 1.4 0.8
CTL14 − 0.0417 − 0.0439 − 0.0409 − 2.2 0.8
CTL15 − 0.0325 − 0.0334 − 0.0318 − 0.9 0.7

3. The flatness measurement of the concrete base of an ice floor in the National
Speed Skating Oval

The National Speed Skating Oval is located at Olympic Park in Chaoyang District,
Beijing. This is the new ice venue for the Olympic Winter Games Beijing 2022. It has
the largest all-ice design in Asia, with 12,000 m2 of ice. Because the top layer of the ice
sheet rests on a concrete base that covers the refrigeration pipes and the last structural
layer beneath the ice, it is essential to have a flat concrete foundation to minimize
the temperature difference across the ice surface. According to the requirements of
the International Skating Union, the temperature difference on the ice surface of the
290 4 Indoor and Underground Space Measurement

speed skating field cannot exceed 1.5 °C. The smaller the temperature difference,
the more uniform and smooth the ice surface hardness, which is advantageous for
athlete performance. Therefore, it is necessary to inspect the flatness of the concrete
foundation following its construction to make any necessary adjustments before its
completion.
According to the relevant official documents and the project requirements of the
National Speed Skating Oval, the quality standard specification of the concrete base
is that the height difference of the ice surface is less than ± 3 mm within the range of
5 m. The results of the flatness inspection of the concrete base lay the foundation for
subsequent ice plate refrigeration. We conducted a flatness inspection on the accepted
concrete base during the construction stage in October 2021 and a flatness inspection
on the concrete base during the completion stage in December 2021 (Fig. 4.47).
The inspection robot measures the relative height of the profile along its path, so
it can only reflect the flatness of each point on its route. For wide rectangular floors,
the measurement route is planned in a grid. For long and narrow sports tracks, the
measurement route is planned along the track. The specific measurement line plan
of the National Speed Skating Oval is shown in Fig. 4.48 [12].

1) Measurement results during the construction stage

The survey during the construction period is mainly divided into four areas: the north
inner rink, the south inner rink, the practice track, and the racetrack. We measured the
flatness of the north inner rink on October 1, 2020, because the equipment is newly
developed, and it is the engineers’ first time working on this construction site. The
first measurement of 30 m × 10 m was completed under various complexities, and a
digital surface model of the cement-based surface was obtained. The measurement
results show that the elevation of the cement-based surface fluctuates within the range
of ± 4 mm to the average elevation, and more than 90% of the measured points meet
the processing requirements of ± 3 mm within the range of 5 m.
During the construction period, the north inner rink and some areas in the south
of the racetrack were not measured due to the limitations of the Total Station setup
condition, lighting conditions, and other factors on the site. The overall flatness map
and the proportion of indicators in the field are shown in Fig. 4.49. The green area
in the figure passes the criteria, and the other colored areas are the locations where
the flatness indicator exceeds the limit. The efficiency parameters, such as the total

Fig. 4.47 The national speed skating oval


4.4 Flatness Detection of Super-Large Concrete Floor 291

Fig. 4.48 Measurement routes for different floors

length of the survey line, the total number of sampling points, and the accumulated
time, are listed in Table 4.8. It can be seen from the table that the method used in this
measurement has a certain improvement in terms of speed and efficiency compared
with the traditional flatness measurement method.

2) Measurement results after completion

After the completion of the project, a wheeled flatness measurement device for the
concrete floor is used to perform the as-built measurement of flatness in the entire
venue. The specific implementation steps are similar to those during the construc-
tion period. However, to make the measurement results more truly reflect the real
situation of the floor surface, the density of the survey lines was increased during the
completion period, and the distance between the survey lines was reduced to 2 m.
At the time of completion, due to the timely repair and polishing of the spots where
the flatness is over the limit and the addition of an odometer constraint to the inertial
navigation measurement vehicle, the flatness index of the entire measurement area
has been greatly improved. The results are shown in Fig. 4.50.
To obtain good visibility conditions, a total station was set up in the auditorium
during the measurement. The testing area is divided into four areas: the race track,
the practice track, the north inner skating rink, and the south inner skating rink. The

Fig. 4.49 Flatness


measurement results during
construction on the entire site
292 4 Indoor and Underground Space Measurement

Table 4.8 The measurement efficiency parameters during construction


Total length of measuring Total number of sampling Total accumulated time of data
mileage point collection
9946.68 m 100,423 21,313.56 s≈5.92 h

Fig. 4.50 Overview of surveying lines in the National Speed Skating Oval during the completion
of the construction period

distribution of control points and inspection routes for the National Speed Skating
Oval is shown in Fig. 4.50, and the measurements for the four areas are shown in
Fig. 4.51. For most areas, we used gridlines with widths of 2–3 m for data collection.
The total length of the measurement route is about 10.261 km, and the total measure-
ment time is approximately 4.88 h (Table 4.9). The measurement speed reaches about
2 km/h, and the final number of flatness inspection points is 102,000. Traditionally,
it takes 2–3 days to measure such a super large floor with leveling. Our method is as
much as 4 times more efficient than the traditional method.
The measurements show (Fig. 4.51) that 99.4% of the sampling points of measured
flatness on the standard speed skating track are within ± 3 mm in the range of 5 m.
This number is 99.3%, 99.7%, and 99.3% on the practice track and the northern and
southern inner rinks, respectively. It should be noted that there are some sampling
points that are over the limit in a very small area to the southwest of the standard
speed track and the practice track. Overall, the floor flatness of the National Speed
Skating Oval meets the design requirements [12].
4. Comparison of results during the construction and completion period
According to the statistics on the flatness of the entire site during the construc-
tion period, 96.19% of the inspection points can meet the acceptance criteria of
± 3 mm/5 m in the early stage of construction and the design requirements. During
the construction period, the quality of floor flatness is further improved by timely
polishing and adjusting the remaining areas where the flatness is over the limit.
After the completion of the National Speed Skating Oval, we once again conducted
4.4 Flatness Detection of Super-Large Concrete Floor 293

Fig. 4.51 Flatness measurement result analysis in different sections of the National Speed Skating
Oval

Table 4.9 Flatness measuring efficiency during the completion period


Total length of measuring Total number of sampling The accumulated time of data
mileage point collection
10,261.40 m 102,621 17,574.41 s≈4.88 h

a comprehensive inspection measurement of the flatness. By comparing the data


during the construction and completion period, the degree of improvement can be
assessed. The distribution of flatness measurement results in the southern area of the
site is shown in Fig. 4.52a, b during the construction and completion period. The
quality of flatness has been greatly improved. The comparison results of the overall
flatness during the construction and completion period are shown in Fig. 4.52c.
The proportion of inspection points that meet the requirement of 5 m ± 3 mm has
increased by about 3, and 99.15% of the inspection points of the entire site meet the
requirements by a statistical analysis of the data. During a later test on the site, the
temperature difference on the ice reached 0.5 °C, which fully met the requirement of
294 4 Indoor and Underground Space Measurement

Fig. 4.52 Flatness comparison between the construction period and the completion period

the 1.5 °C temperature difference by the International Skating Union. This also veri-
fies that it has a good flatness on the cement base from a side. It laid a key foundation
for building “the fastest ice” in the Olympic Winter Games Beijing 2022 by means
of rapid flatness inspection technology to assist the construction of the concrete base
on the ice.

4.5 Defect Inspection of Drainage Pipelines

As an important channel for urban rainwater and sewage discharge, the drainage
network is the lifeline to maintain the safe operation of the city. In recent years,
urban disasters such as urban inland inundation, road subsidence, and black and
stinky water have occurred frequently, which are shocking and have a severe impact
on the safety of life and property of urban residents. Behind these phenomena, there
are reasons for the defects of the urban drainage network, such as inadequate drainage
capacity and blockage leading to urban inland inundation and groundwater emptying
the roadbed by broken drainage pipelines, causing road subsidence. With the rapid
expansion of cities in China, due to factors, such as the load flows far exceeding design
4.5 Defect Inspection of Drainage Pipelines 295

criteria, pipelines and facilities degeneration, the impact of new major projects such
as subways, the construction quality of hidden projects, and insufficient underground
detection means in the operation and management process of municipal water conser-
vancy departments, an increasing number of pipeline network issues are constantly
exposed, including plugs and breaks, the cross connection of rainwater and sewage,
aging and damage. It is the key to timely detect risks and ensure the safe operation and
maintenance of the drainage pipeline network system by periodic and comprehensive
inspection of the underground pipeline networks.
To repair the defects in the underground drainage network in time, it is necessary
to conduct regular inspections of the drainage pipeline network. However, due to the
drainage network with a large range and the complex internal environment, it is very
difficult to conduct a large-scale inspection. At present, the common internal inspec-
tion technologies of drainage pipelines include closed circuit television (CCTV)
inspection robots, pipe endoscopic sonar, pipe periscopes, pipe endoscopes, etc., as
shown in Fig. 4.53. The CCTV inspection robot is currently the mainstream inspec-
tion device. Generally, the robot is operated through cables, and the disadvantage is
that a single operation is limited and complicated. The pipe periscope is suitable for
pipeline detection scenarios with a diameter of less than 2 m and a detection range
of no more than 80 m. The most prominent advantage is that it is intuitive, easy
to operate, and portable. The disadvantage is that it cannot detect the structure of
the pipeline, nor can it detect continuously, and the detection distance is relatively
short each time. Pipeline sonar detection technology has a good detection effect on
pipelines full of water, water logging, and pipeline deformation. However, sonar
detection systems are expensive and difficult to operate, and they can only detect
pipeline conditions below the liquid level. The pipe endoscope is mainly composed
of an integrated controller, a cable reel with a flexible push rod, and a camera. The
front-end camera is sent into the pipe by using a flexible push rod cable, and LED
lighting is integrated to achieve the preview of the internal image of the pipe and
record. Pipe endoscopes are generally used as a supplement to other inspection equip-
ment and are suitable for slender, narrow, and curved pipelines that cannot be entered
by other equipment. In general, the existing methods are inefficient, and the cost is so
high that they are not suitable for general inspection of large-area drainage pipeline
networks. Therefore, it is urgent to develop a fast and low-cost drainage network
detection technology to provide an effective means for drainage network inspection
and to improve the level of intelligent operation and maintenance of the pipeline
network.

4.5.1 Drainage Pipeline Detection Method Based


on a Floating Capsule Robot

A visual drainage pipeline inspection method based on fluid-driven capsules is


proposed to address the difficulties of detecting large-scale urban pipeline networks
with water [14]. The principle of visual inspection of the drainage pipeline by the
296 4 Indoor and Underground Space Measurement

Fig. 4.53 Pipeline inspection method

fluid-driven capsule is shown in Fig. 4.54. A new type of low-cost drifting pipeline
capsule is released upstream; a photo of the internal pipe wall is taken when the
capsule is drifting in the pipeline, and the capsule is retrieved downstream. The visual
detection method of drainage pipelines based on fluid-driven capsules needs to solve
two problems: one is the positioning of the drifting pipeline capsule, and the other
is the automatic identification of visual defects. To meet the requirements of capsule
positioning, the capsule device integrates a variety of vision and motion sensors and
optimal fusion of various positioning information of visual, inertial, optical flow, and
pipeline network maps to achieve continuous positioning. By using a 9-axis AHRS
system in the pipeline capsule to solve attitude, inertial-aided image matching is used
for the problem of poor registration reliability on texture-less images. Furthermore,
the position error of the inertial/visual combination is updated and corrected by the
extracted coordinates of the manhole. The Kalman filter method is used to achieve the
optimal fusion of all the data, and the position information of the drifting robot in the
pipeline is obtained. Unsupervised and supervised pipeline network defect inspection
and evaluation methods are designed to reduce the cost by using computer vision and
deep learning algorithms for the complexity and diversity of defects in the pipeline
network.
4.5 Defect Inspection of Drainage Pipelines 297

Fig. 4.54 The principle of visual drainage pipe inspection by a fluid-driven capsule

1. Drifting drainage pipeline inspection capsule

Inspired by the intestinal inspection capsule, we designed a fluid-driven drainage


pipeline detection capsule with high operation efficiency, autonomous positioning,
and automatic inspection of pipeline defection. In this section, the appearance struc-
ture, sensor composition, and technical parameters of the capsule are introduced;
furthermore, the features of the capsule are summarized and analyzed.

1) Structure design of the inspection capsule

Due to the complexity of the operation scenario of the drainage pipeline detection
capsule and the restriction of operating conditions, the structural design of the capsule
is one of the keys to whether it is reasonable or not, which will directly impact the
stability and accessibility of the capsule during drifting or towing and seriously
affect the quality of video data and the validity of the measurement data of the IMU.
Considering various factors, the boat hull design is chosen for the appearance of the
structure of the drainage pipe detection capsule. The boat hull structure not only has
good stability and accessibility but can also effectively guarantee that the capsule
will not capsize during drifting and towing by a reasonable counterweight so that the
lens is always facing above the water surface and an explicit internal image in the
drainage pipeline can be captured. The appearance structure of the drainage pipeline
detection capsule is shown in Fig. 4.55.
The shell can be divided into upper and lower parts; the upper part is the operation
panel, the lens and light, etc., and the lower part is the wading part. After systematic
testing, the waterproof level of the capsule can reach the IP67 level. The inside of
the shell is mainly composed of an electronic compartment, battery compartment,
and counterweight compartment. The electronic compartment includes the CMOS
sensor, camera system on chip (SOC), and auxiliary circuit system, and the battery
compartment is used to place and fix the lithium battery. The counterweight compart-
ment is used to load the lead brick to adjust the center of gravity and draft of the
capsule so that the center of gravity of the capsule is moving as close to the geometric
center as possible to ensure that the capsule will not easily capsize during drifting or
towing. A cross-sectional view of the pipeline detection capsule is shown in Fig. 4.56.
298 4 Indoor and Underground Space Measurement

Fig. 4.55 Three-view drawing of the inspection capsule

Fig. 4.56 Interior structure drawing of the drainage pipe inspection capsule

2) Sensors and circuit design


The electronic measuring module is mainly responsible for the real-time acquisition,
time synchronization, and storage of image and inertial data, and the design of the
circuit system is the key to the measurement system. The circuit of the electronic
4.5 Defect Inspection of Drainage Pipelines 299

Fig. 4.57 Hardware framework of the inspection capsule

measurement system mainly includes the CMOS image sensor, the circuit of the
camera SOC, power management unit, the circuit of the IMU, and related interface
circuits. The entire structure is shown in Fig. 4.57. The whole circuit system takes the
camera SOC as the core control device. After the system is powered on, the capsule
automatically connects to the hotspot established by the tablet computer through Wi-
Fi, and then the operation information can be entered on the tablet computer through
the customized application program. In the meantime, the data collection of the
capsule is triggered. After the operation is completed, the data collection is switched
off, the measurement data are downloaded to the local, and the data processing is
completed.
3) Technical parameters

The exterior structure design and the electronic measurement system together form
the drainage pipeline inspection capsule. When each part functions stably, a clear
image inside the pipeline and inertial data during movement can be obtained. The
physical product of the drainage pipeline detection capsule is shown in Fig. 4.58. The
hardware of the drainage pipeline detection capsule is a composite of a waterproof
shell, a wide-angle camera monitoring module, an inertial measurement unit, an
optical flow module, an LED fill light, a photosensitive adjustment system, a main
control circuit system, a power module, a counterweight block, a storage unit, and
a human–computer interactive unit. Here, the LED illumination is connected to the
photosensitive adjustment system, installed on the surface of the waterproof shell, and
connected to the power module, which automatically adjusts the lighting required
300 4 Indoor and Underground Space Measurement

Fig. 4.58 The drainage


pipeline inspection capsule

Table 4.10 Device technical parameters sheet


Item Key parameters
Video resolution 1920 × 1080
Low illumination Low-light camera, low illumination less than 0.001 lx
View fieldangle Not less than 180°
Camera switched on Camera function automatically turn on after Power-on
Continuous recording time Great than 90 min
IMU IMU data collection at 200 Hz
Optical flow Optical flow data collection at 100 Hz
Camera illumination Multilevel adjustable light sources
SD memory card Support SD storage, support 32G storage
Automatic storage Support video automatic storage, video and IMU data writing to
files according to time stamp
Capsule counterweight Counter weight is adjustable

by the camera. Other units, including the wide-angle camera monitoring module,
inertial auxiliary measurement unit, optical flow module, power module, main control
circuit, and human–computer interaction unit, are placed inside the waterproof shell
(Table 4.10).
2. Inspection data processing method

1) Pipeline capsule positioning


Due to the harsh internal environment and poor texture of the drainage pipeline, the
low-cost visual-only localization method has the problem of the inability to contin-
uously track for a long time and error accumulation, and the scale is ambiguous.
4.5 Defect Inspection of Drainage Pipelines 301

Although the IMU can continuously collect motion data and perform motion dead
reckoning regardless of the environment constraints, it has the problem of accumu-
lation error. To achieve continuous positioning in the pipeline network environment,
the pipeline detection capsule device not only includes a wide-angle fisheye lens
but is also equipped with a motion sensor of the IMU and an optical flow module.
Therefore, continuous positioning in the pipeline network environment is achieved
by fusing various visual and motion sensor data and the prior information of the
absolute position and shape in the pipeline network map.
2) Inertial aided fisheye image matching
The movement information can be measured by both fisheye images and inertial data,
but visual-only or inertial-only motion estimation has the problem of low reliability.
Therefore, we employ inertial-aided fisheye image matching to lay the function
for reliable motion recursion. The specific process of inertial-aided fisheye image
matching is shown in Fig. 4.59. After collecting fisheye images and motion data,
pre-processing steps such as image distortion correction are first performed. After
image pre-processing is completed, the affine invariant detection algorithm Hessian-
Affine is used to extract feature points. The inertial-aided matching method is used
for image matching after extracting feature points, and motion inference is finally
achieved.
A matching method of inertia-aided fisheye RANSAC gross error elimination is
proposed for the problem of the extracted feature points containing a large number of

Fig. 4.59 Inertia-aided pipeline texture-less image registration


302 4 Indoor and Underground Space Measurement

wrong points because of sparse texture and geometric distortion during fisheye image
matching [15]. The matching points are converted into fisheye spherical coordinates,
and the IMU is used to provide the relative rotation angle to assist the fisheye image
epipolar constraints and improve the accuracy of pose estimation and gross error
elimination. The method can be divided into three stages:
(1) Fisheye imaging model construction: the camera interior parameters and fisheye
camera distortion parameters are determined by camera calibration, and the
camera imaging model is retrieved from the fisheye image.
(2) Image feature extraction and coarse matching: the feature points are extracted
by affine invariant detectors.
(3) Gross error elimination: the inertial-aided fisheye RANSAC method is proposed
to construct the fisheye spherical coordinates, and then the inertial-aided 4-point
RANSAC method is used to remove gross errors. This method not only uses
IMU data to improve the gross error elimination effect but also does not require
camera-IMU calibration, which is convenient for application. More importantly,
the reliability of the epipolar constraint is improved by constructing the fisheye
spherical coordinate model, which further improves the accuracy of the gross
error elimination.
3) Manhole cover landmark feature extraction and map matching
The main purpose of manhole landmark feature extraction and map matching is to
extract the timestamp or serial number of the image containing the manhole cover. It
can match the existing sewage pipeline network CAD map according to this sequence
of the serial number or timestamp, so the absolute geographic coordinates and their
corresponding timestamp of the manhole cover node in the video sequence can be
obtained. To reduce the workload of manually reviewing videos, it is necessary to
establish an automatic extraction method for manhole covers in the sewage pipeline
network. Compared with the damaged form of the pipeline network and the diversity
and complexity of the environment, the target shape of the manhole cover is relatively
simple. To improve the detection accuracy, an image binary classification model
based on deep learning can be constructed separately. The manhole cover extraction
process is shown in Fig. 4.60.
4) Inertial/visual/optical flow/landmark feature, multi-source data fusion posi-
tioning
The multi-source data fusion positioning includes the 9-axis motion data (angular
velocity, acceleration, geomagnetic field data) collected by the IMU, the horizontal

Fig. 4.60 Manhole cover feature extraction workflow


4.5 Defect Inspection of Drainage Pipelines 303

pixel displacement of the capsule device obtained by the optical flow module, the
continuous image frames captured by the main camera, and the pose estimation
data obtained by the multi-view geometric method. The previous data, including
the pipeline network map, the absolute position information of the drifting path of
the capsule, and the marker point (manhole cover location), can be obtained, which
can be used for absolute correction in the fusion position. The multi-source data
fusion positioning processing framework is shown in Fig. 4.61. Visual and inertial
measurement fusion processing is performed to obtain the camera pose based on
the video captured by the capsule and the collected inertial measuring data. The
absolute distance between the manhole covers is used to make absolute error correc-
tions to determine the position information of the image with defects. The tightly
coupled fusion of visual and inertial measurements for odometer positioning is
processed in multi-segment pipelines and combined with optical flow data for loosely
coupled fusion processing. Visual and inertial measurement fusion processing mainly
includes image information processing, IMU pre-integration, and backend nonlinear
optimization [16]. The loosely coupled fusion includes the fusion of the result of
the tightly coupled fusion with optical flow data and pipeline network map, and the
continuity of positioning in the entire pipeline is achieved.

5) Drainage pipeline defects on image recognition

The main purpose of the pipeline capsule robot is to inspect all kinds of defects in the
drainage pipeline, and there are many types of drainage pipeline defects with various
characteristics. For example, the shedding appears in planar texture, the cracks appear

Fig. 4.61 The framework of multi-source data fusion positioning


304 4 Indoor and Underground Space Measurement

Fig. 4.62 Some typical pipe defects

in linear texture, the tree roots are both linear or planar, and the scrum and sedi-
ment both appear in planar texture features but are very different. According to the
relevant regulations of “Technical Specification for Inspection and Evaluation of
Urban Sewer (CJJ181-2012)”, the defects of drainage pipes are roughly divided into
two categories, namely, functional defects and structural defects. Structural defects
have ten categories: leakage, breakage, deformation, corrosion, misalignment, undu-
lation, joint offset, interface material falling off, hidden connection of branch pipes,
and penetration of foreign objects. The functional defects include sedimentation,
scaling, obstacles, remnant walls and dams, tree roots, and scum. The typical defects
in drainage pipes are shown in Fig. 4.62.
The traditional method is to visually interpret images and videos, including
CCTV field operation. People achieve the purpose of accurate defect identifica-
tion through the apparent features of pictures, experience, and continuous manual
attempts. Affected by factors such as light, video clarity, and work fatigue, the accu-
racy and efficiency of manual identification are low. Using the deep learning method
to identify pipeline defects can effectively improve work efficiency [17].
The technical process of pipeline defect identification based on the deep learning
model is shown in Fig. 4.63. Based on the deep neural network algorithm, the pipeline
defect data sets are collected and trained using the neutral network algorithm. The
keyframes with defects are extracted from the video of the internal pipeline collected
by the pipeline capsule. These frames are classified and labeled according to the
defect classification specification, and a pipeline defect data set is created. The neural
network is then trained and evaluated to obtain the optimal weight model.
Accuracy, precision, and recall are evaluation metrics in information retrieval, arti-
ficial intelligence, and search engine design. When evaluating the effects of different
4.5 Defect Inspection of Drainage Pipelines 305

Fig. 4.63 The technology road map of pipeline defect identification based on deep learning

neutral networks in identifying pipeline defects, it is necessary to evaluate the deep


learning algorithm. The specific evaluation methods are as follows:

TP + TN
accuracy = (4.41)
TP + FN + FP + TN
TP
precision = (4.42)
TP + FP
TP
recall = (4.43)
TP + FN
where TP is true positive, denoting that the prediction is positive and actually true,
FP is false positive, meaning that the prediction is positive but actually false, TN is
true negative, denoting that the prediction is negative and actually negative. FN is
false negative, denoting that the prediction is negative and actually false. When both
the identification accuracy and recall meet the requirements, the neutral network
selected has good applicability to the automatic identification of pipeline defects.
The results of the automatic identification of pipeline defects using the deep learning
model are shown in Fig. 4.64.
306 4 Indoor and Underground Space Measurement

Fig. 4.64 Automatic identification of pipeline defects

4.5.2 The Test and Application of Drainage Pipe Network


Detection

The floating pipeline inspection capsule can complete the inspection of internal
defects of various drainage pipelines, and the continuous position information of
the pipeline capsule can be obtained by fusing multi-source positioning data. The
pipeline categorized defects can be automatically detected and assessed by using
computer vision and deep learning algorithms. The results of the pipeline network
defect status and location can be integrated into the urban pipeline network manage-
ment platform. Visual simulations of underground pipelines, such as the defect loca-
tion, buried depth, material, shape, direction, manhole structure, and surrounding
areas, can be presented in combination with geographic information system tech-
nology and provide an accurate, intuitive, and efficient reference for the maintenance
of water supply and drainage networks.
1. Test for drainage pipeline inspection capsule

1) IMU-aided fisheye image matching

The fisheye image matching of the pipeline capsule is the key to positioning. The
results show that the IMU-aided fisheye registration method can effectively improve
the matching accuracy by comparison experiments on the fisheye image data sets
of urban drainage pipelines. In different scenarios, the accuracy of the method in
this book is better than that of the common fisheye image gross error elimination
method, as it proves the reliability of this method in fisheye image feature matching.
The overall precision, recall, and F-score of matching in the experiments are shown
in Table 4.11. It can be observed from the analysis of the overall data that locality
preserving matching (LPM) precision is 66.49%, but recall is 96.82%, which is the
highest among the five methods. Vector field consensus (VFC) is sensitive to distor-
tion and has the lowest accuracy at 65.57%. The spherically optimized RANSAC
(So-RANSAC) performance is the best, with a precision of 92.57%, a recall of
88.41%, and an F-score of 90.44%, and both precision and F-score are the highest.
4.5 Defect Inspection of Drainage Pipelines 307

Table 4.11 Performance


Method Precision Recall F-score
comparison
RANSAC 0.832693 0.757375 0.793250
LPM 0.664963 0.968233 0.788440
4-Point RANSAC 0.888411 0.824689 0.855364
VFC 0.655793 0.957326 0.778377
So-RANSAC 0.925717 0.884100 0.904430

The performance of the 4-point RANSAC is the next best, and the precision and recall
after adding IMU assistance are better than those of the RANSAC method, with an
F-score of 85.53%. The precision of the RANSAC method is 83.26%, which is better
than that of the LPM and VFC, but the recall is the lowest at 75.73%. In general,
the So-RANSAC method has small accuracy fluctuations in real and experimental
scenarios and is robust to complex environments [18–21].

2) Comparison experiment of drainage pipeline capsule and CCTV inspection robot

Following the provisions of the inspection items and relevant industry technical
standards, the underground drainage pipelines in operation are inspected. Based
on the most widely used pipeline inspection method—CCTV inspection, a typical
underground drainage pipeline is selected, and pipeline defects are marked. The
CCTV detection robot is used to perform crawling detection in the pipeline, and the
drainage pipeline inspection capsule is used to perform floating inspection. The
inspection results of the two pipeline network defects are compared and analyzed.

(1) CCTV inspection robot

The distance from the starting tube well or the ending tube well measured by the
pipeline CCTV inspection robot is used as a benchmark. First, mark two positions
of the cable of the crawling robot at the entrance of the well and the defect place,
and then measure the length of the cable at the two marked points with a standard
measuring tape. The measurement is used as the reference for the location of the
defect. The pipeline CCTV inspection robot X5 series is used in the experiments.
(2) The result of the drainage pipeline inspection capsule

The capsule floating inspection site is shown in Fig. 4.65. The continuous position
information of the drainage pipe capsule (Fig. 4.66) and the interior mosaicked image
of the pipeline (Fig. 4.67) are obtained by post-processing the data collected by the
capsule, image matching, and stitching, and pose estimation by 9-axis MEMS and
the visual system. Then, this information is combined with the location information
provided by the pipeline map and optical flow module to redundantly supplement
the visual–inertial fusion results and constrain the global location. According to the
data of the time stamp corresponding to the defect, the positioning distance of the
defect from the starting position can be calculated.
308 4 Indoor and Underground Space Measurement

Fig. 4.65 Inspection site

Fig. 4.66 Some track of pipeline capsule in testing pipe sector Y001-Y002-Y003

Fig. 4.67 Fisheye image stitching in the testing pipeline sector

The comparison results of the CCTV robot are shown in Table 4.12, and the
pipeline capsule inspection device can meet the evaluation requirements of pipeline
network facilities. Because of its low cost, lightweight, simple operation, and high
4.5 Defect Inspection of Drainage Pipelines 309

efficiency, the pipeline capsule can be used as a means of general rapid inspection
of large-scale urban pipeline networks.
2. Application

In the actual application of drainage pipeline inspection, the workflow is as follows:


➀ on-site survey, ➁ preparation of inspection plan, ➂ on-site inspection and image
data collection, ➃ multi-source data analysis, and ➄ submission of the report of the
pipeline network inspection results. The main tasks of the on-site survey include
inspecting the surrounding geography, landform, traffic, and distribution of the
pipeline, visually checking the water level in the well, mud depth and water flow,
and the pipe position, diameter, and material in the data.
The plan for the inspection site is as follows: first, clarify the testing purpose,
scope, and deadline, and then make a testing plan based on the existing materials. In
the face of different pipeline water depths, the capsule can choose different detection
methods. When the water depth range is between 15 and 65%, the capsule can be
directly put into the target pipe section for the drifting operation. When the water
depth is less than 15%, the floating plate can be bound under the capsule and moved
in the section by manual dragging to complete pipeline defect inspection. In addition,

Table 4.12 The result comparison of pipeline defect identification


The defect image identified by CCTV The defect image identified by the drainage
inspection robot inspection capsule
310 4 Indoor and Underground Space Measurement

the water depth of the pipeline can also be increased by manually releasing water,
and the capsules are put into the drifting operation. When the water depth is greater
than 65%, the target pipe section must be properly blocked and pumped to reduce
the water depth, and then the capsules are put into the drifting operation. Before the
test starts, dredging, cleaning, ventilation and toxic and harmful gas detection must
be carried out in the pipeline.

1) Residential area pipeline network inspection case

To meet the demands of the treatment project of rainwater and sewage mixed flow
in Xinzhan district, Hefei city, the drainage pipeline network of about 350 km in
90 communities in Xinzhan district was sorted out and screened. The quality prob-
lems of the pipeline network and the mixed flow of rainfall and sewage shall be
inspected, designed, and rectified according to the requirements. It mainly includes
the upgrading of the rainwater and sewage pipeline network, cross connections, and
balcony drainage in new residential communities, solving the issue of the mixed
connection of rainwater and sewage pipeline networks. The inspecting team demon-
strated the capsule application project in Hefei city. The municipal main sewage
and drainage pipeline network has been inspected in more than 30 old residential
communities in Hefei city, and the detection pipe length reaches 120 km. The inspec-
tion site is shown in Fig. 4.68. After investigation, it was found that the pipelines in
many communities were broken and deformed, and the repair index was high. Some
pipe sections have not implemented rain and sewage diversion, and a mixed flow
of rainwater and sewage exists. It was also found that some drainage pipelines are
mixed with other types of pipelines. Some of the inspection results are reported in
Tables 4.13 and 4.14.

2) The case of drainage pipeline network inspection

After more than 30 years of construction and development in the Pingshan district,
Shenzhen city, a very complex drainage system has been established. However, an
increasing number of pipeline network problems are exposed in the operation and

Fig. 4.68 Drainage pipeline inspection site scene


Table 4.13 Pipeline capsule inspection report
Video file 20,219,506,190,908 Starting well no. Y003 Ending well no. Y004
Inspector YPP Start point buried depth 2m Ending point buried 2
depth
Type of pipe section Rainwater pipeline Material of pipe section HDPE double-wall Diameter of pipe section 300 mm
corrugated pipe
Inspection direction Downstream Length of pipe section 35 m Inspection length 35 m
Inspection location CNSG Anhui Hongsifang Co., Ltd Inspection date 2019–06-19
No. Inspection time Time taken to inspect Description of inside pipeline Photo
4.5 Defect Inspection of Drainage Pipelines

1 09:08:51 1 min 42 s Break, deformation 2

Photo 1 Photo 2
311
312 4 Indoor and Underground Space Measurement

Table 4.14 Drainage pipeline cross-connection report


Cross connection point NWNY33 Sketch of cross connection point
no.
Cross connection CNSG Anhui Hongsifang Co.,
location Ltd
Description of cross The sewage connected to
connection Y1136, the diameter of
connected pipe is DN100,
mild cross connection

Reason of cross Sewage pipe connected to rainwater pipe


connection

management of the municipal water conservancy department due to factors such as


the load far exceeding the design capacity, old pipelines, and facilities, the negative
impact of new major projects such as subways, the construction quality of concealed
works, and the lack of underground detection methods. These defects will not only
seriously affect the daily discharge of urban rainwater and sewage but also may cause
secondary problems such as inland flooding, environmental pollution, and even urban
disasters such as subsidence in extreme environments. In the meantime, the lack of
drainage pipeline network information also results in the problem of the connection
between the newly laid pipeline and the old pipeline network, which seriously affects
the future development planning of the Pingshan district. It is necessary to conduct
a comprehensive inspection of the entire underground drainage network system in
the Pingshan district.
The pipeline inspection capsule was used to inspect the relevant drainage network
in the Pingshan district, and the connected region surrounded by Tiyu 2nd Road in
the east, Xinhe Road in the south, Jinlong Avenue in the west, and Pingshan Avenue
in the north was selected. The rainwater and sewage pipelines were inspected for all
selected roads, and the total length of the inspection is approximately 10 km, of which
the rainwater pipes are all reinforced concrete pipes with two diameters, DN800/
DN1000. The sewage pipes are reinforced concrete and double-wall corrugated pipes
with two diameters, DN400/DN600. In the face of different water depths in the
pipeline, the drifting operation method is applied when the water flow in the pipeline
is satisfactory, and the dragging operation method is applied when there is no water
in the pipeline or the water flow is not satisfactory. After inspection, a total of 138
defects were found, including 45 defects in rainwater pipes, 93 in sewage pipes,
24 functional defects, and 114 structural defects. Some of the defects are shown in
Tables 4.15, 4.16, and 4.17.
4.5 Defect Inspection of Drainage Pipelines 313

Table 4.15 Pipe erosion report


Defect no. J2
Road section Zhenhuan road
Pipeline type Rainwater pipe
Defect location K0+280
Pipe material Reinforced concrete
Defect name Erosion
Defect level Level 1
Defect type Structural defect
Defect location description: three clear
erosion places on drainage pipe wall

Table 4.16 Pipe breakage report


Defect No J5
Road section Tiyu 2nd road
Pipeline type Sewage pipe
Defect location K0+102
Pipe material Double-wall
corrugated pipe
Defect name Broken
deformation
Defect level Level 4
Defect type Structural defect
Defect location description: drainage
pipe wall severely damaged,
deformation and erosion

3) The case of reservoir water supply tunnel inspection

In addition to being used in the underground pipe network environment, the drainage
pipe capsule can also be used as a general pipe network inspection device in the
detection of reservoir water supply tunnels. Taking the Huangcun Reservoir water
supply tunnel in Zhejiang Province as an example, it was built in 2002 and has been
in operation for many years without comprehensive safety inspection and evaluation.
Since the water supply tunnel of the Huangcun Reservoir is one of the main water
supply sources in Lishui city, it is impossible to shut down water for investigation
and inspection. It may be difficult to access the inside of the tunnel for detection
by manual methods or other mechanical equipment, such as drones, driverless cars,
and autonomous boats equipped with sensing devices. Under such circumstances,
the capsule device can also be used as one of the detection methods to conduct
314 4 Indoor and Underground Space Measurement

Table 4.17 Pipe crack report


Defect No J9
Road section Tongfu road
Pipeline type Sewage pipe
Defect location K0+421
Pipe material Double-wall
corrugated pipe
Defect name Crack
Defect level Level 2
Defect type Structural defect
Defect location description: A clear
crack on the drainage pipe and extends
longer

Fig. 4.69 Inspection of the water supply tunnel

the whole process of internal video detection in the pressure-less tunnel. The main
inspection includes collapsing bodies, cracks, dissolution, erosion, leakage, and other
defects that may exist along the tunnel. The tunnel entrance environment is shown
in Fig. 4.69a, and its main detection video image is shown in Fig. 4.69b.
4.6 Internal Deformation Measurement of Earth-Rockfill Dam 315

4.6 Internal Deformation Measurement of Earth-Rockfill


Dam

There are two components to an earth-rockfill dam: the impervious system and
the dam body. This type of dam is typically regarded as secure, cost-effective,
and adaptable. Hence, it is the dam of preference for China’s water conservation
and hydropower projects. As a safety indicator, internal deformation is an essen-
tial parameter of the dam. For instance, if the deflection of the concrete face is
excessive, the panel may crack or a cavity accident may occur. Therefore, moni-
toring the internal deformation continuously and precisely throughout the construc-
tion and operation of these types of dams is crucial. According to earth-rockfill dam
monitoring safety specifications, internal deformation consists of vertical settlement,
horizontal displacement, and concrete face deflection. Hydraulic overflow settle-
ment gauges and extension line displacement gauges are, respectively, the most
commonly used monitoring devices for vertical settlement and horizontal displace-
ment. The concrete face deflection is the compressive bending of an earth-rockfill
dam’s upstream panel caused by water storage and the deformation of the rockfill
body. In practice, it is typically measured using either fixed or mobile inclinometers.
Existing monitoring methods lack the ability to acquire measurements reliably for
the following three reasons:
(1) Point-type measurement sensors can only monitor a single parameter at a time,
necessitating the installation of numerous monitoring sensors, resulting in sparse
monitoring points and distinct monitoring data.
(2) The structures of hydraulic overflow settlement gauges and extension line
displacement gauges are intricate, and the associated monitoring systems have
low impact resistance and high failure rates, which can lead to a lack of
monitoring data.
(3) When the height of the dam exceeds 200 m, the performance limits of the liquid
supply tube and steel wire are nearly reached. As a result, the measurement
accuracy and range of sensors are severely compromised. Since China launched
dam engineering technology to build 300-meet-tall earth-rockfill dams, it is
urgent to develop a highly precise and reliable continuously distributed internal
deformation monitoring technique for earth-rockfill dams.
In an effort to address the shortcomings of existing monitoring methods for
measuring internal deformation, researchers have proposed a variety of methods
employing inertial sensors. Some researchers have proposed a pipeline-based
distributed deformation measurement method. In summary, the primary measure-
ment principles are comparable in that the pipeline’s curve is determined by
measuring the increment in coordinates between two adjacent measuring points. In
the literature [22], a measurement method employing a four-wheeled robot equipped
with a laser gyroscope and magnetometer is proposed. The robot is pulled with
uniform speed by an electric winch. The distance between two adjacent points is
calculated by multiplying the moving speed by the measurement time. On the pipeline
316 4 Indoor and Underground Space Measurement

robot, a ring-laser gyroscope and magnetometer are installed to estimate the pitch
angle and azimuth angle. A four-wheeled robot equipped with an inclinometer and
a digital camera is proposed in other research [23] to measure a pipeline comprised
of multiple steel pipe sections. Due to the rigidity of an individual pipe, this method
assumes that no deformation occurs. Consequently, the inclinometer measures the
pipe’s inclination, which is then multiplied by the length of the single pipe to esti-
mate the vertical curve. The camera is utilized to measure the joint angles between
adjacent pipes, which are then multiplied by the length of a single pipe to determine
the horizontal curve. In these previous works, the accuracy of the final measure-
ment is not high because the four-wheeled robot has trouble ensuring that the robot
measurement center always coincides with the pipeline axis, the steel pipeline cannot
fully reflect the internal deformation of the dam, the sensor’s measurement accuracy
is insufficient, and the raw measurement error is not handled carefully. According
to published results, the repeatability or accuracy of measurements in different tests
ranges from millimeters to centimeters, meaning that this method cannot meet the
millimeter-level requirements at distances greater than 100 m.
We propose a new method for monitoring the internal deformation of earth-rockfill
dams based on the inertial measurement of flexible pipelines [24]. The fundamental
concept is to install pressure-resistant flexible pipelines within the rockfill dam and
to use an integrated IMU and multiple odometers on a pipeline robot to measure the
3D curves of the deformation-monitoring pipelines. In this manner, the parameters
of deformation can be determined by comparing the measured curves at different
times. This pipeline-based monitoring method only requires pipeline installations
to be monitored during dam construction. This system is more impact-resistant and
reliable than conventional settlement and displacement gauges. In addition, the results
are transformed from discrete point-like data to continuous 3D curve data, enabling
the simultaneous measurement of multiple deformation parameters [25].

4.6.1 Internal Deformation Monitoring for Earth-Rockfill


Dam via High-Precision Flexible Pipeline
Measurements

A new method based on precise inertial measurements of embedded flexible pipelines


is proposed to achieve continuous and dependable internal deformation monitoring
for high earth-rockfill dams. The overall structure is depicted in Fig. 4.70. Pressure-
resistant and flexible polyethylene pipelines are installed on-site and monitored
throughout the dam’s construction, constituting an integral part of the dam. The
pipelines can, therefore, deform synchronously with the dam, allowing for pipeline
measurements. In the latter stages of monitoring, precise measurements of the moni-
tored pipelines are taken using a pipeline robot with an integrated IMU and multiple
odometers, and the pipelines’ curves at a particular monitoring time are obtained. On
the basis of a comparison of the multiphase pipeline curves, the vertical deformation,
4.6 Internal Deformation Measurement of Earth-Rockfill Dam 317

horizontal deformation, and deflection deformation of earth-rockfill dams along the


normal direction of the pipelines can be further estimated. This section describes the
deformation measurement system used within earth-rockfill dams, the high-precision
measurement method for pipeline curves, and the deformation parameter calculation
method.
1. Pipeline deformation measurement system

1) Pipeline robot
As shown in Fig. 4.71, a pipeline robot system is developed to meet the measure-
ment requirements of millimeter-scale pipeline curves. The mechanical structure
consists primarily of three modules: a synchronous spring-suspended walking wheel
module, an independent spring- suspended odometer wheel module, and a measure-
ment module. The three modules are connected precisely by concave and convex
flanges as well as screws. One end of the measurement module is connected to both

Fig. 4.70 Framework of the earth-rockfill dam internal deformation measurement process based
on high-precision flexible pipeline measurements using a pipeline robot
318 4 Indoor and Underground Space Measurement

Fig. 4.71 Pipeline measurement robot system

the walking wheels and odometer wheels, while the other end is only connected to
the walking wheels.
A synchronous spring-suspended walking wheel module with three evenly
distributed wheels is installed at both ends of the measuring robot. The wheel bracket
is connected to a slide bar by linear bearings at both ends, which are springs. The
three wheels are simultaneously pressed against the pipe wall to suspend the frame
in the pipeline and ensure that the frame’s axis is parallel to the pipeline’s axis.
Adjusting the spring’s elasticity on the slide bar ensures that the walking wheels
are positioned close to the pipeline wall. The wheels are comprised of abrasion-
resistant resin materials with the appropriate level of hardness to prevent pipeline
deterioration.
Independent spring-suspended odometer wheels, each with a measuring wheel
equipped with a high-precision rotary encoder, are utilized to precisely measure the
body frame velocity. To ensure that the circumferences of the odometer wheels are
unaffected by the vehicle’s weight, an independent spring-suspended wheel frame
is utilized, as opposed to the low-precision common pipeline measuring robots
currently in use. Consequently, regardless of how the vehicle rotates, the stress
on the wheel is approximately the same, ensuring that its circumference does not
change, thereby ensuring the accuracy of the distance measurement. In addition, the
spring-suspended, independent odometer wheels can measure the inner diameter of
the pipeline through the cantilever, modify the 3D trajectory, and compensate for the
self-generated deformation.
The measurement module comprises multiple sensors (an IMU and multiple
encoders, as shown in Tables 4.18 and 4.19), a customized integrated control board,
batteries, and other auxiliary accessories. The IMU, which consists of three laser
gyroscopes and three quartz accelerometers, can acquire accurate acceleration and
angular velocity measurements, and the multiple encoders can acquire accurate rota-
tions of the odometer wheels and cantilever. The integrated control board is the central
component of the integrated acquisition control of multiple sensors and ensures that
data from multiple sensors can be collected synchronously. The geometrical center
of the IMU coincides with the measurement module shell’s center and the pipeline’s
central axis. During the measurement process, the pipeline robot is permitted to move
within the pipeline using a winch or manually drawn rope, and the central axis curves
of the pipeline are measured [25].
4.6 Internal Deformation Measurement of Earth-Rockfill Dam 319

Table 4.18 Specifications of


Specifications Gyroscope Accelerometer
the IMU
Range ± 220°/s ±5g
Bias 0.01°/h (1 σ ) ≤ 15 μg (1 σ )
√ √
Noise ≤ 0.002°/ h 30 μg/ hz
Scale factor 30 × 10−6 30 × 10−6
Sampling rate 500 Hz 500 Hz

Table 4.19 Specifications of


Specifications Rotary encoder
the IMU and rotary encoders
Measurement Single turn absolute position
Angular resolution 17bits, 131,072
Static error < 0.025°
Maximum operational speed 4000 r/min

2) Deformation-monitoring pipeline
The deformation monitoring pipeline must accurately reflect the irregular deforma-
tion occurring within the dam and deform in tandem with the dam. In addition, since
the pipeline is the measuring robot’s track, it must be able to withstand compres-
sion to ensure that the cross-section is round or approximately round. Therefore, the
pipeline for deformation monitoring is axially flexible and radially rigid. Taking into
account these two characteristics, a pressure-resistant flexible polyethylene pipe is
chosen. When the pipeline is buried, the pipeline segments are welded together to
form a qualified and smooth deformation monitoring pipeline. The bending exper-
iment (shown in Fig. 4.77) shows that the pipeline has good flexible deformation
characteristics in the axial direction and retains a good circular shape in the radial
direction.
To transform the pipeline’s 3D curves to a unified geographical coordinate system,
the absolute position of the deformed pipeline must be measured. At both ends of the
pipeline, customized clamps (Fig. 4.72) are designed and installed. They consist of
a precise cylindrical barrel, a rotatable and removable cover, a prism and horizontal
bubble level, a supporting frame, and anchoring accessories. The barrel’s length
corresponds to that of the pipeline robot. The primary functions of the pipeline
clamp are to obtain the absolute control point from the prism and to ensure that the
pipeline robot’s starting and ending points are always in the same position.
2. High-precision measurement of 3D pipeline curves
The high-precision deformation measurement method is founded on the precise
measurement of the pipeline’s curve. Combining data from multiple odometers,
control points, and inertial measurements, a precise method for measuring pipeline
curves is described. Multiple odometers provide precise measurements of body frame
velocity, which can correct the velocity error generated by a pure INS and improve
320 4 Indoor and Underground Space Measurement

Fig. 4.72 Photos of the designed model and pipeline clamps

the relative measurement accuracy of the pipeline curve. By adjusting the entire
trajectory through the control points at both ends of the pipeline, the absolute accu-
racy can be improved. Forward Kalman filtering and backward RTS smoothing, a
classic Bayesian optimal fusion framework for multi-source time-series data, are
used to fuse the IMU, odometer, and control point data to obtain the most accurate
3D pipeline curve. To achieve data fusion, the state model, dynamic model, and
measurement model are designed using the Kalman filtering framework.
The state vector is a vector in a Kalman filter that describes the current state of the
system. For multi-source information-aided inertial navigation, it is frequently neces-
sary to estimate the navigational state (including attitude, position, and velocity), the
inertial sensor error, and the aiding sensor error, which are the odometer scale coef-
ficient errors. As shown in the following equation, an 18-dimensional state vector is
adopted to represent the instantaneous state of the pipeline measurement robot.
[ ]T
x = φ δvn δr n bg ba kd (4.44)

where φ denotes the attitude error vector, δvn represents the velocity error vector,
δr n represents the position error vector, bg represents the residual gyroscope bias,
ba represents the residual accelerometer bias, and kd represents the scale coefficient
error of the three odometers.
The dynamic model is utilized to describe the state change of the system. Models of
the navigation state, biases of inertial sensors, and the time-varying rule for odometer
scale factors are constructed. The classic φ angle error model is used for modeling
navigational errors.
Since the pipeline robot utilizes a high-precision IMU with a sensor nonlinearity
of less than 30 × 10−6 and moves slowly (< 2 m/s) inside the pipeline, only the
bias error is modeled. The dynamics of the sensor bias error are described using the
first-order random Markov model.
Due to the uniform force exerted on the odometer wheels, it can be assumed that
the corresponding scale factor remains unchanged. Therefore, the dynamics of the
odometer scale coefficient error kd are modeled by the random constant value model.
The velocity of inertial navigation is determined by integrating the acceleration.
The relative accuracy of the curve measurement cannot be guaranteed due to the
4.6 Internal Deformation Measurement of Earth-Rockfill Dam 321

accelerometer’s bias and random error, which causes the velocity error to increase
with time until it diverges. Some redundant odometers are used to obtain accurate
and reliable velocities to assist the INS, which can effectively improve the relative
accuracy of curve measurements. In addition, as the pipeline robot moves through the
pipeline, the wheel has only forward speed in the vehicle frame, allowing nonholo-
nomic constraints to be applied. When a pipeline seam is encountered, the pipeline
measurement vehicle may shake, resulting in an error perpendicular to the forward
direction.
The integrated inertial and odometer navigation system continues to use the
dead reckoning principle. Despite the high local relative measurement accuracy,
the measurement error still accumulates, particularly for long pipelines, and the
desired accuracy may not be achieved. To increase the accuracy of the curve measure-
ments, control points are placed on the clamping devices at both ends of the pipelines.
When the pipeline robot comes into contact with these devices, the control point can
be transmitted to the pipeline robot system via a fixed mechanical structure, and the
pipeline robot’s absolute position error can then be corrected.
For measurement applications, accuracy is the most crucial factor. The result of
Kalman filtering is the optimal estimation of the system’s current state based on all
measurements collected to date. To estimate the optimal curve with all measurements,
RTS smoothing is applied to the Kalman filtering results, and the optimal post-
processing solution can be obtained with significantly more accuracy than the filtering
solution. At each sampling time, the smooth solution includes the navigation states
(i.e., position, velocity, and attitude) of the pipeline robot. These states comprise the
3D pipeline curve [25].
3. Earth-rockfill dam internal deformation estimation
After obtaining the 3D curves of the pipelines at various time intervals, the newly
measured curves are compared to the reference curve to estimate deformation. We
begin by registering curves measured at different times. According to the specifica-
tions of earth-rockfill dam internal deformation, a calculation method for the internal
deformation index based on 3D curves is then developed.
1) Registration of multiple curves
Due to the complexity of 3D registration, we developed an algorithm that indexes
the curve point by its distance from the starting point. This reduced the registration
problem for 3D curves to 1D signal registration. Taking into account the fact that
a pipeline can be modeled as a 3D curve, any point on the curve can be located
based on its distance from the curve’s origin, and the point’s position corresponds
to the linear coordinate formed by straightening the curve. The curve can, therefore,
be parameterized within the linear reference frame. The 3D curve must then be
resampled to obtain equidistant measurement points in a linear coordinate frame.
In conclusion, we use the signal of the pipe junction to register curves measured at
different times for subsequent deformation calculations.
As the sampling frequency of the pipeline robot is 500 Hz, and the robot’s speed is
approximately 1 m/s, the curve between adjacent sampling points can be considered
322 4 Indoor and Underground Space Measurement

Fig. 4.73 Linear reference curve generated through equidistant sampling according to the raw
trajectory

a segment of a straight line. When the pipeline robot’s movement direction remains
unchanged, the distance along the curve from the current point to the starting point can
be calculated by adding the distances between the adjacent points located before the
current point. However, the pipeline measurement robot may exhibit reciprocating
motion along the pipeline during actual operation due to slopes, operation, and other
factors. This direct distance integral method may be incorrect for data with changing
motion direction during the acquisition process. Therefore, the reciprocating motion
should be eliminated prior to further processing of the data.
Typically, a reciprocating motion lasts no more than a few meters (usually less than
2 m). An algorithm is designed to convert a 3D trajectory into the distance traveled
in the pipeline, thereby removing the effect of reciprocating motion (Fig. 4.73). To
obtain sparse control points P 1 from the raw trajectory points P 0 , we first sample
with a large window so that P 1 are ordered in the moving direction. The new distance
d 1 of P 0 is then computed using interpolation from P 1 . We can detect reciprocating
motion based on the new distance d 1 and finally obtain the ordered trajectory points
P 2 . The particular procedures are as follows:

(1) According to the initial trajectory point set P 0 , the 3D distances of all adjacent
trajectory points are calculated and integrated to determine the initial distances
d 0 of P 0 . Due to reciprocating motion, a larger sampling distance L (L < 2 m)
is designed, and trajectory points are sampled at each interval of L to produce
a set of sparse sequential trajectory points P 1 . The initial distance P 1 is used
as the control distance.
(2) The two nearest points (P1,a , P1,b ) in the sparse trajectory point set P 1 corre-
sponding to each point in the trajectory point set P 0 are searched, and the line
segment formed by (P1,a , P1,b ) is used for linear interpolation to obtain the
distances d 1 of P 0 .
(3) The distances d 1 of each point are ordered, and the reverse points are eliminated
to obtain an ordered list of trajectory points P 2 . According to the calculation
method described in the first step, the distance d 2 between each point on the
trajectory is then determined.
4.6 Internal Deformation Measurement of Earth-Rockfill Dam 323

Fig. 4.74 Registration of curves measured at different times

(4) Based on d 2 , spline curve fitting is performed for all x, y, z coordinates to


obtain a parameter spline curve against the point distance, and then the curve is
resampled at equal intervals using the maximum and minimum distance values.
The distances of the same curve point measured at different times are inconsistent
due to odometry errors. Before comparing the multi-period curves, it is necessary to
adjust the distances to enable the unification of the distances at the same point in the
pipeline. Observing the inertial measurement data, it is evident that the pipeline robot
produces noticeable vibrations when passing through the pipeline seam (Fig. 4.74),
which are reflected in the acceleration and angular velocity measurements. The
pipeline seam can be detected and precisely located based on these vibrations, which
can also be used to associate curves measured at different times [25].
2) Deformation index calculation
The horizontal deformation, vertical deformation, and face deflection deformation
indicators for dam deformation monitoring are clarified based on the deforma-
tion measurement principle and the provided dam monitoring specifications. Their
respective denotations are as follows:
(1) The vertical deformation describes the variation in the vertical settlement of the
pipeline, as shown in Fig. 4.75a.
(2) Panel deflection refers to deformation perpendicular to the direction of the panel,
as shown in Fig. 4.75b.
(3) The horizontal displacement is the horizontal deformation perpendicular to the
pipeline, also known as the normal direction of the pipeline. It should be noted
that the horizontal displacement direction differs from the specifications for dam
monitoring. All the positions of deformation indicators take the corresponding
distances from the starting point of the pipeline as a reference.
324 4 Indoor and Underground Space Measurement

Fig. 4.75 Vertical deformation and deflection of an earth-rockfill dam

3) Vertical deformation and deflection calculation


In the linear reference system, the elevation at dk in the deformation monitoring
pipeline measured at the ith time relative to the starting point can be expressed as:


n
hi (dk ) = sin θi (dk )∆d (4.45)
k=1

where dk is the position of the kth distance point, θi (dk ) is the pitch angle on the
kth distance point (Fig. 4.75a), and ∆d is the sample distance of curve points. The
vertical pipeline deformation of the jth time relative to the ith time at point k can be
expressed as


n ∑
n
∆h(dk ) = h j (dk ) − hi (dk ) = sin θ j (dk )∆d − sin θi (dk )∆d (4.46)
k=1 k=1
4.6 Internal Deformation Measurement of Earth-Rockfill Dam 325

The calculation principle of panel deflection is similar to that for vertical deforma-
tion, with the exception that the designed level angle (Fig. 4.75b) in the concrete face
must be subtracted from the pitch angle. Note that the vertical deformation obtained
here is determined relative to the starting point of the pipeline; if absolute vertical
deformation is needed, the elevation of the prism at the pipeline ending point must
also be considered.
4) Horizontal deformation calculation
Since the measured curve is approximated by many straight-line segments (Fig. 4.76),
the horizontal deviation of point k + 1 relative to point k in the normal direction
measured at the ith time can be expressed as:

∆lk+1 = ∆lk + sin ∆ϕk+1 ∆d (4.47)

where ∆ϕk+1 is the azimuth change of the same line segment measured at the jth
time relative to the ith time, so the deviation in the normal direction at dk can be
represented as:


n
( )
∆l(dk ) = sin ϕ j (dk ) − ϕi (dk ) ∆d (4.48)
k=m

Fig. 4.76 Calculation of horizontal lateral deformation

Fig. 4.77 Experimental test pipeline


326 4 Indoor and Underground Space Measurement

where ϕi (dk ) is the azimuth of the segment measured along the length dk at the ith
time, and dm is the starting point for the horizontal displacement calculation. When
the pipeline azimuth changes slightly, as the change in the pipeline length can be
disregarded, ∆l(dk ) is equivalent to the displacement ∆p(dk ) of each sampling point,
abbreviated as ∆pk (so that ∆l(dk ) ≈ |∆p(dk )|). Since the direction of ∆p(dk ) is
roughly equal to the normal direction of the pipeline, and the horizontal deforma-
tion perpendicular to the dam axis can be calculated (as shown in Fig. 4.76). For
a linear monitoring pipeline, the starting point of the pipeline can be chosen as the
calculation starting point. However, the starting point of the arc is typically chosen
as the calculation starting point for a pipeline that is curved or arcuate due to the
large direction decomposition error.

4.6.2 Experiments and Results

1. Field testing

An experiment was conducted to verify the effectiveness and accuracy of the proposed
measurement system. A set of 140 m long L-shaped polyethylene bending pipelines
was installed (as shown in Fig. 4.77). To simulate the actual pipeline system inside
the dam, the test pipeline was elevated using supports, and high-precision pipeline
clamping devices were attached to both ends of the entire pipeline. Therefore,
the pipeline measurement robot’s repeatability, absolute accuracy, and deformation
accuracy could be evaluated.

1) Measurement repeatability

Repeatability and accuracy are essential characteristics of a measurement system. It


is possible to estimate the random errors of a system by repeatedly measuring the
same target and analyzing the differences between the test results. To calculate the
repeatability, the same pipeline was measured multiple times within a short period of
time and under stable weather conditions. Multiple curves measured in a single run
were grouped together. Four measurements were taken for each group. A total of three
test groups were conducted, and twelve measurement curves were obtained. Using
the first measured curve as the reference curve, the deformation calculation method
was applied to each pipeline curve to determine its respective deformation. Since
the pipeline does not change rapidly, the pipeline robot’s repeatability and precision
can be estimated by calculating the probability distribution of the deformation. The
statistical deviation of the measured elevation curve from the mean curve was used as
the repeatability when estimating elevation repeatability. Each measured elevation
curve hi differed from the reference elevation curve hr , resulting in an elevation
deviation curve ∆hi,r , which is ∆hi,r = hi − hr . The difference between each
measured elevation curve and the mean elevation curve is shown in Fig. 4.78 and
Fig. 4.79. The repeatability and accuracy of pipeline curve measurements can be
characterized by statistically analyzing all hi . The identical method can be used to
calculate the accuracy of horizontal measurements [25].
4.6 Internal Deformation Measurement of Earth-Rockfill Dam 327

Fig. 4.78 Repeatability of single-run measurement (4 curves were compared)


K
∆hi,r (dk )
δhi = ∆hi,r − (4.49)
k=1
K

The standard deviation of the horizontal and elevation repeatability for a single
measurement was better than 1 mm (138 m long pipeline, Fig. 4.78), while that for
multiple measurements was better than 1.5 mm (138 m long pipeline, Fig. 4.79).
The repeatability and accuracy were approximately 1/100,000 the length of the
pipeline. Since the principle of deflection deformation measurement is similar to
that of elevation measurement, the accuracy can be compared to the accuracy of
elevation measurement.

2) Measurement accuracy

(1) Validation of elevation accuracy

To test the absolute measurement accuracy of the pipeline robot, measurements of the
same pipeline section obtained with leveling instruments (checkpoints in Fig. 4.80a)
328 4 Indoor and Underground Space Measurement

Fig. 4.79 Repeatability of measurements over multiple runs (12 curves were compared)

and the pipeline robot (measured curve in Fig. 4.80a) were compared. Using a single-
station leveling method, 39 pipeline level points with a distribution range of approx-
imately 100 m were measured. The experimental results indicate that the pipeline
robot could achieve an absolute accuracy of 1.7 mm (RMSE).

(2) Validation of deformation accuracy

To evaluate the performance of the deformation measurement system for the pipeline,
two artificial deformation points were created by filling the structure (Fig. 4.80b).
As the reference value, the elevation change between these two deformation points
was measured with a Vernier caliper (Fig. 4.80b). In addition, the pipeline robot
was utilized to measure the 3D curves before and after deformation (Fig. 4.80a,
c). The results of comparing the deformation values measured by the two methods
to validate the pipeline robot’s deformation measurement accuracy are shown in
Table 4.20. Point 1 and Point 2 had deformation measurement errors of − 0.73 mm
and 1.16 mm, respectively.

2. Dam deformation monitoring project

The Jiayan Dam is primarily involved in urban and rural water supply, irrigation, and
energy production (Fig. 4.81). It is situated in the middle reaches of the Liuchong
River in the Chinese province of Guizhou. The capacity of the reservoir is 1.325
× 109 m3 . The dam is a rock-filled concrete face dam. The highest dam height is
4.6 Internal Deformation Measurement of Earth-Rockfill Dam 329

Fig. 4.80 Elevation accuracy and vertical deformation accuracy validation

Table 4.20 Comparison of deformation measurements


Test point (m) Vernier caliper (mm) Pipeline robot (mm) Difference (mm)
Point 1 (53.08) 39.76 39.03 − 0.73
Point 2 (56.09) 39.83 40.99 1.16

154 m. The length of the dam crest is 428.93 m. The dam’s body was completed in
September of 2019. The impoundment will begin in December 2021. During the dam
body construction period in July 2019, the proposed internal deformation monitoring
system was installed on the elevation plane approximately 1285 m, or 43 m below the
dam’s crest (Fig. 4.81a). In addition, a conventional monitoring system consisting of
hydraulic overflow settlement gauges and extension line displacement gauges was
implemented. In 2019, internal deformation measurements were performed three
times between September and December. For verification purposes, the measured
results were compared to those of the existing traditional methods of dam monitoring
[25].
330 4 Indoor and Underground Space Measurement

Fig. 4.81 Jiayan rockfill dam and pipeline embedding design

1) Deformation-monitoring pipeline deployment


To monitor dam deformation, a new pipeline embedding method is proposed in
combination with the measuring principle of the developed pipeline robot, namely,
the “arc-shaped” pipeline embedding method (Fig. 4.81b), which exposes both ends
of the pipeline and is straightforward to maintain. The endpoint control points can
effectively correct measurement errors. As shown in Fig. 4.81c, two pipelines were
installed on the 1285 m elevation plane of the Jiayuan Dam. The length of pipeline
G1 (denoted G1) is approximately 204 m, while the length of pipeline G2 is approx-
imately 327 m (denoted G2). They were buried close to the extension line displace-
ment meter and hydraulic overflow settlement gauge of the dam, making it easy to
compare the measurement results of these two systems.
The pipeline clamp and the observation room were anchored together during the
installation of the pipeline system to ensure that the horizontal bubble on the pipeline
clamp was centered. By carefully observing the prism on the pipeline, it is possible
to calculate its prism’s central coordinates relative to the pipeline opening’s center.
2) Dam internal deformation measurement
After the pipeline was installed, the 3D coordinates of the control points at its two
ends were measured precisely. At the ends of the pipeline, horizontal coordinates
were determined using a total station with a precision of 0.5 s to conduct multi-
station intersection measurements. The height difference between the ends of the
4.6 Internal Deformation Measurement of Earth-Rockfill Dam 331

Fig. 4.82 Jiayan Dam monitoring result.


Reprinted from Automation in Construction, 136, Zhipeng Chen, Yu Yin, Jianwei Yu, Xiang Cheng,
Dejin Zhang, Qingquan Li. Internal deformation monitoring for earth-rock dam via high-precision
flexible pipeline measurements, 104,177, Copyright (2022), with permission from Elsevier

pipeline was obtained using a second-class leveling measurement with a closure


difference better than 1 mm.
During a single measurement, it was assumed that the pipeline did not change over
the course of a day, and the robot measured the pipeline multiple times to ensure data
integrity. Using multiple-period observations, the measurement curves of the same
monitoring pipeline over distinct time intervals were obtained. After recording the
curve, a comparison was carried out to identify the deformation of the dam.
On September 3, October 18, and December 23, 2019, three-phase observations
were conducted for two pipelines in the Jiayan Dam; the measurement results of the
pipeline curves are shown in Fig. 4.82; vertical deformation and horizontal defor-
mation curves for two periods from September 3 to October 18 and from September
3 to December 23 were calculated, as depicted in Fig. 4.82b and c, respectively.
According to the pipeline G1 settlement deformation curve, the deformation between
September 3 and October 18 was greater than the deformation between October
18 and December 23. This is primarily because the rate of settlement deformation
decreased gradually after the dam’s completion at the end of September 2019. In
addition, the deformation of the pipeline near the central axis of the dam (along
the longitudinal and transverse axes) was significantly greater than that at the dam’s
edge, primarily due to the greater pressure exerted by the dam’s uppermost portion
at the dam’s center. Since the dam was not constructed and impounded from the
332 4 Indoor and Underground Space Measurement

Table 4.21 Comparison between the data obtained from the pipeline measurement robot and the
hydraulic overflow settlement gauge (with reference data collected in September 2019)
Observation Pipeline Location/m Pipeline Hydraulic Difference /mm
time measurement overflow
robot/mm settlement
gauge/mm
2019–10 G1 34 11.79 11.628 0.16
2019–10 G1 104 17.24 16.13 1.11
2019–10 G1 160 11.97 12.27 − 0.30
2019–10 G2 30 5.58 10.5 − 4.92
2019–10 G2 70 10.61 13.1 − 2.49
2019–10 G2 105 9.95 13.78 − 3.83
2019–10 G2 169 6.95 10.1 − 3.15
2019–10 G2 209 9.39 12.5 − 3.11
2019–10 G2 244 11.96 14.775 − 2.82
2019–10 G2 287 7.73 10.6 − 2.87
2019–12 G1 30 14.12 16.148 − 2.03
2019–12 G1 104 23.3 21.3 2.00
2019–12 G1 160 16.85 16.32 0.53
2019–12 G2 30 10.17 15 − 4.83
2019–12 G2 70 17.88 17 0.88
2019–12 G2 105 17.99 15.94 2.05
2019–12 G2 169 15.79 13.2 2.59
2019–12 G2 209 17.46 18.25 − 0.79
2019–12 G2 244 19.616 20.7 − 1.08
2019–12 G2 287 12.713 14.4 − 1.69
Mean value − 1.23
Standard 2.24
deviation

end of September to December, the measured horizontal deformation was close to


zero. The experimental data demonstrate that the monitoring results of the proposed
method are generally consistent with the law of dam internal deformation, proving
its efficacy.
3) Validation of the measurement accuracy
To quantitatively verify the efficacy and accuracy of the proposed method, the
internal deformation monitoring results calculated from pipeline measurements
were compared with those measured by hydraulic overflow settlement gauges.
Figure 4.82a depicts the distribution diagram of the conventional vertical deformation
monitoring points. Since these points did not coincide completely with the pipeline,
the intersections of the transverse and longitudinal survey lines with the pipeline
References 333

curve were chosen as verification points. Pipeline deformation values were inter-
polated using the linear interpolation method for components lacking conventional
monitoring points. The specific comparison results are shown in Table 4.21, demon-
strating that the proposed method can effectively monitor the internal settlement of
pipelines. The maximum deviation from the conventional method is 5 mm, and the
standard deviation is 2.2 mm or approximately 1/100,000 of the pipeline detection
length. These results satisfy the measurement specifications for large earth-rockfill
dams [25].

4.7 Summary

Indoor and underground space is essential for human activities but has limited space,
a complex structure, limited photoelectric signals, etc. Current research focuses on
location services and accurately detecting infrastructure for moving objects, such as
people, vehicles, and devices in indoor and underground spaces. This chapter intro-
duces various of positioning technologies based on sensors with different degrees of
precision and implements the positioning of moving objects in a non-GNSS environ-
ment. In addition to these positioning technologies, we propose a 3D mapping method
for indoor location services and a precise detection method for indoor and under-
ground spaces. These methods improve indoor and underground mapping capabilities
significantly.
However, due to the complexity of indoor and underground spaces and the diver-
sity of moving objects, the existing methods are insufficient for complex applica-
tions based on spatial locations and maps, such as disaster search and rescue and the
war on terrorism. Indoor and underground space measurement technology based on
optical, electrical, acoustic, magnetic, and other sensing methods will be the research
direction in the future for a broader range of applications.

References

1. Gusenbauer D, Isert C, Krosche J (2010) Self-contained indoor positioning on off-the-shelf


mobile devices//Proceedings of International Conference of Indoor Positioning and Indoor
Navigation, Zurich.
2. Gu F, Chung M H, Chignell M, et al (2021) A survey on deep learning for human activity
recognition. ACM Comput Surv 54(8):1–34.
3. Dong Z, Liang F, Yang B, et al (2020) Registration of large-scale terrestrial laser scanner point
clouds: A review and benchmark. ISPRS J Photogramm Remote Sens 163:327–342.
4. Zhou B, Li Q, Mao Q, et al (2014) Activity sequence-based indoor pedestrian localization
using smartphones. IEEE T Hum-Mach Syst 45(5):562–574.
5. Zhou B, Ma W, Li Q, et al (2021) Crowdsourcing-based indoor mapping using smartphones:
A survey. ISPRS J Photogramm Remote Sens 177:131–146.
6. Gilliéron P Y, Merminod B (2003) Personal navigation system for indoor applications//
Proceedings of the 11th IAIN World Congress, Berlin.
334 4 Indoor and Underground Space Measurement

7. Zhou B, Li Q, Mao Q, et al (2015). ALIMC: activity landmark-based indoor mapping via


crowdsourcing. IEEE T Intell Transp, 16(5): 2774–2785.
8. Steffey D, Uriz P, Osteraas J (2012) Using ASTM E1155 to determine finished floor quality:
Minimum sampling requirements used to establish compliant floor flatness and levelness//
Proceedings of Forensic Engineering 2012: Gateway to A Safer Tomorrow, San Francisco.
9. Kangas, M A (2017) Concrete screeding system with floor quality feedback/control. U.S. Patent
9,835, 610, 2017.
10. Zhu Z, Brilakis I (2010) Machine vision-based concrete surface quality assessment. J Constr
Eng M 136(2):210–218.
11. Li F, Li H, Kim M K, et al (2021) Laser scanning based surface flatness measurement using
flat mirrors for enhancing scan coverage range. Remote Sens-Basel 13(4):714.
12. Chen Z, Li Q, Xue W, et al (2022) Rapid inspection of large concrete floor flatness using
wheeled robot with aided-INS. Remote Sens-Basel 14(7):1528.
13. Li Q, Lv S, Chen Z, et al (2022) Rapid measurement of flatness of oversized floor of speed
skating oval in winter olympic games. Geomatics and Information Science Wuhan University
47(3):325–333.
14. Li Q, Gu Y, Tu W, et al (2021) Collaborative inspection for the sewer pipe network using pipe
capsules. Geomatics and Information Science Wuhan University 46(8): 1123–1130.
15. Liang A, Li Q, Chen Z, et al (2021) Spherically optimized RANSAC aided by an IMU for
fisheye image matching. Remote Sens-Basel 13(10): 2017.
16. Qin T, Li P, Shen S (2018) VINS-Mono: A robust and versatile monocular visual-inertial state
estimator. IEEE T Robot 34(4): 1004–1020.
17. Li D, Cong A, Guo S (2019) Sewer damage detection from imbalanced CCTV inspection data
using deep convolutional neural networks with hierarchical classification. Automat Constr 101:
199–208.
18. Ma J, Zhao J, Jiang J, et al (2019) Locality preserving matching. Int J Comput Vision 127(5):
512–531.
19. Ma J, Zhao J, Tian J, et al (2014) Robust point matching via vector field consensus. IEEE Trans
Image Process 23(4): 1706–1721.
20. Nister D. (2004) An efficient solution to the five-point relative pose problem. IEEE Trans
Pattern Anal Mach Intell 26(6), 756–770.
21. Bo L, Heng L, et al (2013) A 4-point algorithm for relative pose estimation of a calibrated
camera with a known relative rotation angle//Proceedings of 2013 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), Tokyo.
22. Liao C, Cai D, Chen H, et al (2019) Development and in situ application of deformation
monitoring system for concrete-face rockfill dam using fiber optic gyroscope. Sensors 20(1):
108.
23. He B, Sun R, He N, et al (2015) The measuring method of the inner deformation for high
concrete faced rockfill dams with pipe-robot. Journal of Water Resources and Architectural
Engineering (5): 78-82.
24. Li Q, Chen Z, Yin Y, et al (2020) Pipeline three-dimensional curve measuring robot and
implementation method therefor. US Patents 16/969, 954 Dec 2020.
25. Chen Z, Yin Y, Yu J, et al (2022) Internal deformation monitoring for earth-rockfill dam via
high-precision flexible pipeline measurements. Automat Constr, 136: 104177.
Chapter 5
UAV 3D Measurement

5.1 Overview

The rapid prosperity of unmanned aerial vehicle (UAV) has brought spatial sensing
to new heights. UAV is a highly automated and versatile delivery platform, which can
be equipped with various sensor payloads such as vision camera, LiDAR, spectral
camera, and synthetic aperture radar. Spatial sensing with UAV can meet various
application requirements under the many diverse environmental conditions. Since
they are collected in a near range and a dynamic way, data acquired by UAVs are of
high spatial and temporal resolution.
UAV 3D spatial sensing is a particularly important technical application of
dynamic and precise engineering surveying. Stereo observation and time-of-flight
(ToF) are two important approaches to obtaining depth information apart from planar
coordinates. With these two approaches, 3D measurements can be realized by using
cameras and LiDAR, respectively. UAVs, when used as the carrier, can move in a
free way with more perspectives, which is a big advantage to the carriers running on
the ground. In addition, automation and intelligence are two main characteristics of
the UAV 3D measurement system, and are the key to its further development. Tradi-
tional data acquisition and processing in UAV systems are of low intelligence but
high requirements for professional operations, limiting their application and promo-
tion. In recent years, intelligent control and data processing technologies have been
largely improved, laying a solid technical foundation for the wide application of
UAVs in measurement-related fields.
This chapter will introduce the techniques and practical application of UAV
systems in the field of dynamic and precise engineering surveying, including UAV
3D LiDAR measurement and optimized views photogrammetry.

© Science Press 2023, corrected publication 2024 335


Q. Li, Dynamic and Precise Engineering Surveying,
https://doi.org/10.1007/978-981-99-5942-6_5
336 5 UAV 3D Measurement

5.2 LiDAR 3D Measurement

5.2.1 LiDAR 3D Measurement System

The UAV-borne LiDAR system can directly acquire the 3D point cloud of the terrain
and objects. Afterward, the point cloud is further processed through filtering, clus-
tering, classification, gridding, and parameterization. The processed point cloud can
be used in terrain mapping, electric powerline inspection, urban change monitoring,
etc. [1].
The system is based on a low-altitude multi-rotor UAV platform. It acquires high-
resolution images and 3D LiDAR point clouds simultaneously, as shown in Fig. 5.1.
The integrated fusion processing and 3D visualization are achieved based on GPS/
IMU high-precision positioning by integrating sensors such as positioning and atti-
tude determination, imaging, and 3D laser scanning. The schematic diagram of the
system is shown in Fig. 5.2.
The principle of the UAV-borne LiDAR system is shown in Fig. 5.3. The distance
from an object to the laser scanning center onboard the UAV can be calculated
by using the time interval between transmitted and returned laser pulse, which is
recorded through a time interval device. With the orientation and the spatial position
of the laser scanning center, 3D coordinates of the object can be obtained. The
spatial position of the laser scanning center at a certain moment can be calculated

Fig. 5.1 Schematic diagram of the UAV-borne LiDAR system


5.2 LiDAR 3D Measurement

Fig. 5.2 Load composition of UAV-borne LiDAR system


337
338 5 UAV 3D Measurement

Fig. 5.3 The principle of the UAV-brone LiDAR measuring system

by using real-time kinematic (RTK) technology, as the UAV receives GPS signals
synchronously with the ground GPS receiver. The IMU measures the 3D attitude
angle, speed, and acceleration of the UAV in real-time. In addition, the instrument
boresight parameters can be calibrated, such as the deviation of the laser ranging
optical reference center relative to the GPS antenna phase center, the three mounting
angles of the laser scanner pod, and the non-parallel between the IMU body and the
carrier coordinate axis.
As mentioned above, the UAV LiDAR system requires the integration of various
types of sensors, including laser scanners, inertial measurement devices, and GNSS
positioning systems. The key is to coordinate and control the sensors with different
resolutions to work under a unified spatial reference frame and time reference frame
[2].
The time reference frame involves the synchronous control of multiple sensors.
The high frequency and large amount of data obtained by laser scanning put forward
higher requirements for synchronous control, which is directly related to whether
laser scanning can make accurate measurements during the high dynamics of UAVs.
Considering that each sensor of the system collects data according to its own sampling
interval, the input frequency of data is different, and the time precision is also
different. To realize the fusion and integrated processing of multi-source data, the data
collected by the sensors of the whole system must be established on the same time
axis. For example, when using high-definition camera images for stereo mapping,
it is necessary to know the position and attitude of the camera at the time of image
exposure and acquisition, both of which need to be provided by GNSS and IMU.
Therefore, it is very important for the whole system to realize the data synchronization
recording of each sensor. Additionally, it is necessary to use high-precision control
fields to accurately calibrate laser scanning sensors, IMU, GNSS, and cameras.
5.2 LiDAR 3D Measurement 339

The overall process of the UAV-borne LiDAR system is shown in Fig. 5.4. Through
post-production, processing, and data utilization, data products such as the digital
elevation model (DEM), digital orthophoto map (DOM), digital line graph (DLG),
and 3D visualization of the geographic environment can be generated, as shown in
Figs. 5.5, 5.6, 5.7 and 5.8.

Fig. 5.4 The overall process of the UAV-borne laser system

Fig. 5.5 The automatic


generation of the DEM
340 5 UAV 3D Measurement

Fig. 5.6 The generation of


the DOM

Fig. 5.7 DLG compilation


based on DOM

5.2.2 Processing Method of LiDAR Point Cloud

The raw data of the airborne LiDAR scanning system include the laser scanning
angle, ranging value, echo intensity and times, absolute position, and heading angle
information based on the output of the pose system. These data have considerable
redundancy and noise, and it is necessary to perform a series of processing steps on
the point cloud to meet the needs of practical applications.
1. Point cloud filtering
The influence of environmental factors is inevitable during the acquisition of point
cloud data. The quality of point cloud data is affected by factors, such as equipment
accuracy, operator experience, and electromagnetic wave diffraction. In addition, due
to the influence of external factors, such as the occlusion of obstacles, there are often
5.2 LiDAR 3D Measurement 341

Fig. 5.8 The rapid 3D visualization based on DEM and DOM

some outliers from the target point cloud. Noise and outliers will seriously affect the
calculation accuracy of local point cloud features (surface normal, curvature) and thus
affect the results of point cloud processing models such as point cloud registration,
object extraction, and model reconstruction. Point cloud filtering is used to extract
the low-frequency characteristics of the data and to eliminate outliers.
Currently, the most commonly used filters include pass-through, conditional
removal, Gaussian, voxel grid, statistical outlier removal, radius outlier filters, etc.
[3]. Pass-through and conditional removal filtering are used to extract the region
of interest at the pre-processing stage. Voxel grid filtering is used to down sample
dense point clouds to reduce the amount of data. Other filters used for point cloud
smoothing remove discrete points.
2. Point cloud registration
It can be seen from the principle of 3D laser scanning that the inevitable occlusion
in actual measurement can lead to the incomplete acquisition of target informa-
tion. During the scanning process, the scanned data of each station are in different
temporary coordinate systems during scanning, and the data of each station must
be converted into a unified coordinate system before subsequent point cloud data
processing. Therefore, point cloud registration is a key component of point cloud
processing.
Point cloud registration is divided into two stages: coarse registration and fine
registration.
Coarse registration refers to the registration of the point cloud when the relative
pose of the point cloud is completely unknown, which can provide a good initial value
for fine registration. At present, automatic coarse registration algorithms, which are
more commonly used, are those based on exhaustive search and feature matching.
342 5 UAV 3D Measurement

The purpose of fine registration is to minimize the spatial position difference


between point clouds based on coarse registration. The most widely used fine regis-
tration algorithms are the iterative closest point (ICP) and various variants of ICP
(robust ICP, point-to-plane ICP, point-to-line ICP, MBICP, GICP, NICP) [4].
3. Object extraction from the point cloud
A point cloud is a digital 3D description of a scene. 3D object extraction from point
clouds is the basis for point cloud scene understanding and various applications, such
as intelligent driving and smart cities. 3D object extraction from point clouds mainly
focuses on two aspects: feature-based (template) and deep learning-based 3D extrac-
tion. The feature-based (template) 3D object extraction method of point clouds relies
on the feature description ability of the feature descriptor. However, due to the factual
situation that there are different degrees of occlusion between adjacent objects in the
scene, shape, and status differences between objects (size, textures, spatial topology,
etc.), and data incompleteness, the feature-based (template) 3D object extraction
method cannot achieve satisfactory results in terms of the type, completeness, and
correctness of ground object extraction [5–7]. The method based on deep learning
neural networks relies on the selection of training samples and the generalization
ability of the neural network, but there are shortcomings in network architecture
design, training samples, and processing large-scale point cloud scenes. Drawing on
the successful experience of deep learning in the field of image recognition, scholars
have proposed a variety of deep learning network structures for the recognition and
detection of 3D objects based on the characteristics of point clouds. According to the
form of the input data, these methods are classified into three categories: multi-view-
based methods, voxel-based methods, and irregular point cloud-based methods. The
multi-view-based methods first render the 3D data into many images from different
perspectives in a certain way and then directly use the mature image convolutional
neural network structure on these images to perform feature learning and recognition
of 3D objects. MVCNN uses a multi-view deep model and achieves an accuracy of
90.1% on the ModelNet40 classification [8]. The advantages of this type of method
are that it can directly use the existing achievements of deep learning and can obtain
better results. Its disadvantage is that the results of data classification depend on the
method and perspective of rendering.
Voxel-based methods divide the space into regular voxels and then generalize the
2D convolutional neural network to the 3D convolutional neural network, which also
has a convolution layer and a pooling layer. PointNet analyzed and compared the
multi-view 3D neural network and the voxel-based 3D neural network and found that
the 3D expression ability still has much room for improvement [9]. Maturana and
Scherer [10] constructed voxels with filling probability for real-time object recogni-
tion. Wang et al. [11] built voxels at the octree level, which speeds up the compu-
tation. Qi et al. [12] analyzed and compared the multi-view 3D neural network and
the voxel-based 3D neural network and found that the 3D expression ability still has
much room for improvement.
5.2 LiDAR 3D Measurement 343

Methods based on irregular point clouds build convolution computations on


irregular raw point clouds. Qi et al. [13] proposed the PointNet neural network,
which adopts point-by-point multilayer perception (MLP) and global feature pooling
(pooling) to obtain the permutation invariance of the point set order and prove that the
PointNet structure can fit the functions of any structure. Qi et al. [13] also proposed
the PointNet++ deep learning network, which enables the neural network to learn
multilevel features. Ravanbakhsh et al. [14] explored the functional form of deep
learning models for collection objects and generalized the function with permutation
invariance. Klokov and Lempitsky [15] designed a convolutional neural network
built on a KD tree. The hierarchical structure of the network is realized through
the KD tree, and each child node will fully inherit the characteristics of the parent
node. Nadgowda et al. [16] constructed the point set into a graph and generalized the
ordinary convolution on the graph to obtain a convolution form that is suitable for
irregular graphs. Among the three schemes, the method of learning based on irregular
point clouds is the most direct and widely applicable. The other two methods require
rendering or gridding of data, which will bring a certain loss of information.
Instance segmentation and semantic segmentation are two independent branches.
The semantic segmentation network assigns semantic labels point by point, and the
instance segmentation network assigns each target instance a label. There are differ-
ences in the prediction of the target, such as segmentation form and segmentation
boundary. To solve the problem of inconsistency between the two results, we intend
to take advantage of the heuristic fusion network proposed by JSIS-Net to jointly
output the two branches and to realize the structured extraction of target elements
that takes into account the accurate geometric boundaries and the correct semantic
information [17].
4. Point cloud classifications
Currently, LiDAR data have been used in several studies for land-cover classification.
Three main differences among those methods are the classifier, data format, and
selection of features. One crucial factor that influences LiDAR classification accuracy
is feature selection. Intensity features, height features, waveform features, multiple
return features, and texture features are commonly used features. The precise and
dense height information of LiDAR data can be used to identify land-cover classes
(e.g., trees and shrubs in urban areas) that are not easy to identify in 2D images.
Classifier selection and data format are also important in LiDAR classification. Raw
LiDAR data are 3D point clouds where points are irregularly distributed. Both raw
data and the gridded image data, which are obtained from the raw data through
interpolation, can be used for classification.
Although many algorithms have been developed for LiDAR data classification,
their performance still needs to be improved. The first issue that should be addressed
for most previous work is that the features used are artificially selected. The selected
features may not be robust and discriminative enough for classification, as the infor-
mation passed by feature descriptors may be decreased. Another issue that should
be considered is that contextual information should be taken into account, as most
of them utilize pixel- or point-based features. Taking contextual information into
344 5 UAV 3D Measurement

account should be able to increase the accuracy of classification, especially in the


case of complex urban areas. In addition, due to the large size and complexity of
LiDAR data, the computation is intensive for urban classification. It is, therefore,
necessary to develop a high-performance computing algorithm to process LiDAR
data.
Pixel comparison features were used in a random forest classifier to classify
urban areas [18]. These features contain the information of the original data and are
obtained from the height map with a linear operation. As randomized trees are fast
and robust in processing thousands of features, classification rules can be extracted
automatically using these features. These features encode contextual information by
using heights from neighboring pixels. As feature descriptors are not required to be
computed in the comparison, the comparison is fast.
The framework of the proposed algorithm is displayed in Fig. 5.9. The aim of
this work is to show the efficiency of pixel comparison features with a random forest
classifier, and the data format of LiDAR data is not the main concern. We test our
algorithm on gridded LiDAR data in this work, but the proposed method can be
extended to point cloud data with further efforts. The gridded image is obtained by
interpolating the 3D point cloud, and the pixel comparison features are then extracted
from nonground objects. A rough classification map is then generated by feeding
these features into a random forest classifier. The final classification results are then
obtained with a postprocessing step using majority analysis, by which classification
errors can be mitigated. We detail the main steps in the following.
1) LiDAR point preprocessing and gridding
Outliers caused by birds, multi-path errors, particles in the air, low-flying aircraft,
and system errors are first removed during the preprocessing of the original data.
The anomalous points, which are distinctly over or under the surface level, are then
detected by the local outlier factor (LOF) based on the concept of local intensity
that can be estimated using the reachability distance from an object to its k-nearest

Fig. 5.9 Framework of the proposed classification algorithm using LiDAR data
5.2 LiDAR 3D Measurement 345

neighbors (KNN). The LOF value of an object is obtained through the comparison
between its local reachability density and that of its neighbors. A sparse region is
obtained if the LOF value is larger than one. If the density of a point is substantially
lower than its neighbor, then it is considered an outlier. The point cloud is divided
into 10 m × 10 m patches to reduce the computational burden, and the LOF value of
each point in a patch is calculated with k = 10. We remove the points if their LOF
values are larger than 20 (the threshold is chosen by experience).
The unstructured point data are converted to gridded data after we rasterize them,
during which a triangulation-based linear interpolation method is used. The neigh-
boring laser points are first connected in 2D using Delaunay triangulation. The loca-
tions of the gridded points are then found by searching the triangles. Finally, within
each triangle, we apply a linear interpolation among all gridded points.
2) Feature selection
The pixel comparison features are generated from the gridded LiDAR data [19, 20].
Two pixels are randomly selected from the neighborhood of each pixel, and the
difference between them is computed. We compute the feature response as:

f i ( p) = H ( p +  p1 ) − H ( p +  p2 ) (5.1)

where f i (p) represents the value of the ith pixel comparison feature of pixel p, and
H (·) represents the height of the gridded data.  p1 and  p2 indicate 2D offset
vectors that are randomly selected within an m × m neighboring box. The process
to generate features is displayed in Fig. 5.10. We represent the current pixel (p) with
a black square and represent the neighboring pixels (p +  p1 and p +  p2 ), which
are randomly selected within a predefined window (7 × 7 in this case), with red
squares. The number of pixel pairs that should be computed is the feature number
that can be used for classification.
These features are fast to compute because they are obtained through simple pixel
comparison. In addition, these features are able to encode contextual information and

Fig. 5.10 Schematic diagram of pixel comparison feature generation


346 5 UAV 3D Measurement

Fig. 5.11 Schematic diagram of the random forest classifier

can be used for different scenes, as we can choose pixel pair numbers and window
sizes flexibly. These features are invariant to height changes in both the monotonous
background and the digital elevation model (DEM), as they are dependent on the
order of heights of the digital surface model (DSM) between neighbors. The details
of objects may be captured for complex scenes without being affected by background
signals. Unlike many other features that are based on height, which need to compute
a local plane [21], these features avoid errors from local plane fitting (Fig. 5.11).
3) Classification by random forest classifier
Classification results are generated by feeding the original height feature and the
features selected in the previous step into the classifier. Many classification algo-
rithms can be used at this point, for instance, neural network classifiers, support
vector machine classifiers, and KNN classifiers. The random forest classifier is fast,
robust, and easy to train. It appears to be eminently suitable among these classi-
fiers because multiple features and multiclass problems are naturally handled. We
will briefly introduce the random forest model in the following. The random forest
model, which combines several tree predictors, is an ensemble learning method.
(Fig. 5.10). Trees contain split nodes and leaf nodes. The split node contains a scalar
threshold (τ ) and a “weaker learner”, which is represented by feature parameters (φ).
The split for an unclassified pixel p repeatedly evaluates the weak learner, starting
from the root node to the leaf nodes:

h( p; φ, τ ) = [φn ( p) ≥ τ n ] (5.2)

where n represents the number of nodes in the tree. [·] is a logical operator. The
pixel branches to the left if h( p; φ, τ ) returns true, and right otherwise. A bootstrap
sample with replacement from the original training data forms the training set of
a particular tree. We call the data that are left out of the sample (approximately
5.2 LiDAR 3D Measurement 347

one-third of the data) out-of-bag (OOB) data. The splitting criterion φn ( p) ≥ τ n is


determined by using the Gini impurity. The values of each candidate that represent
their homogeneity are computed by applying the metric to them. The split quality is
better when the Gini impurity is smaller. Denoting the number of classes for a set of
pixels as J and representing the fraction of pixels that are labeled as class i in the set
using ci , the Gini impurity is computed as:


J
IG (c) = 1 − ci2 (5.3)
i=1

Combining all the trees (weak learners) forms a forest (strong learners). Two
parameters are crucial for the random forest classifier: the number of randomly
chosen features and the number of trees. The strength of each individual tree and
the correlation between any two trees determine the error rate of the forest. If the
strength of the individual trees increases or the correlation decreases, the error rate
of the forest decreases. However, both strength and the correlation are reduced when
we reduce the number of features at each split. The influence on the classification
accuracy is complex, as discussed below.
Overall, the construction of the random forest contains the following steps:
(1) Create bootstrapped samples, which can be used as the training set to grow the
tree.
(2) For each node, select some features randomly, and split the node with a threshold,
which minimizes the Gini impurity and the feature.
Unless only one class is contained by the terminal nodes or the minimum size is
reached, the tree should be as large as possible.
(3) A certain number of trees are generated by repeating steps 1–2.
The most popular class can be computed after generating a large number of trees.
Feature importance, which can also be measured by the random forest model,
makes it possible to evaluate the significance of the proposed features. Measuring
the importance of parameters can be done in two ways. The first way is to compute
parameter importance from permuting OOB data. The error rate for classification is
computed for each tree, and the same is done after a predictor variable is permuted.
The differences between them can be obtained by computing the average over all
the trees and normalized using the standard deviation. The second way to compute
parameter importance is the total decrease in node impurities from splitting the node,
as done by the Gini index (Eq. 5.10). The first way to measure the feature importance
is used in this work.
4) Post-processing
We obtained a smooth classification map and clear boundaries between objects if the
forest was efficient enough. However, classification maps obtained by the forest have
to be refined using majority analysis (MA) due to the imperfectness of the forest.
348 5 UAV 3D Measurement

Fig. 5.12 Schematic diagram of the majority analysis, where a noisy classification map is smoothed
to a new map after running the majority analysis with a 3 × 3 window.

The center value of a given kernel will be replaced by most pixels’ class value in
the kernel (Fig. 5.12). The larger the kernel sizes are, the smoother the classification
image is. The weight of the center pixel determines the times that its class should be
counted while determining the majority. This process contains the following steps.
First, the size of the kernel and the weight of the center pixel are defined. Second,
the number of pixels for each class is counted with a moving window (we count
the center pixel in a different way depending on its weight). Third, we assign the
center pixel to the majority class. This procedure aims to reduce the influence of
“salt-and-pepper” noise.
Reprinted from ISPRS Journal of Photogrammetry and Remote Sensing,
148(FEB.), Wang C, Shu Q, Wang X, et al., A random forest classifier based on
pixel comparison features for urban LiDAR data, 75–86, Copyright (2018), with
permission from Elsevier.

5.2.3 LiDAR 3D Measurement Applications

1. Topographic mapping

UAV-borne LiDAR, a 3D scanning technology, directly acquires 3D point coordinates


of the target object and provides reliable high-density terrain data, and the data
processing is direct and objective. The computation model is very mature. It has
become one of the main methods for obtaining topographic information and creating
terrain surface models.
Point cloud filtering is a key step in producing high-precision DEMs from airborne
LiDAR point clouds. Due to the complexity of the terrain environment, the quality of
the filtering results derived by using LiDAR data processing software (LiDAR-DP
of China ARSC Ltd., TerraSolid) sometimes cannot meet the requirements in some
special areas. Filtering for these areas is currently a difficult problem. To improve
the efficiency of filtering operations, the workload of manual interaction and editing
should be minimized as much as possible (Fig. 5.13). Open-source software such
5.2 LiDAR 3D Measurement 349

as LAStools can provide some help in the process of point cloud filtering in DEM
production.
The quality of airborne point cloud data has an important impact on later DEM
production. However, due to the influence of many factors, such as the surveying
environment, time, cost, and weather, the density of ground points in the airborne
point cloud data obtained under actual operating conditions and the possible local
holes sometimes can hardly meet the requirements of high-precision DEM produc-
tion. It is necessary to fill in the holes and improve the data quality [18]. Figure 5.14
shows the technical solution for point cloud hole filling and quality improvement.

2. Power line inspection

Traditional methods for power line inspection usually involve visual checking or
instrumental measuring (by total station), which is carried out by skilled workers from
a safety distance. However, the many locations with insufficient safety distance of the
powerline are usually inaccessible. Due to the occlusion of trees and buildings and
the deviation of the visual perspective, these measurement methods have difficulty in
making accurate and effective judgments on suspected over-limit points and cannot
meet the needs of the development and safe operation of modern power grids. Ultra-
high voltage power grids urgently need efficient, advanced, and scientific powerline
safety detection methods.
Manned helicopters and large unmanned helicopters carrying laser scanning
systems with powerline inspection technology have been gradually applied to the
country’s power grid inspection. It can acquire the 3D data of the whole powerline
in one flight and obtain the results of the safety distance analysis and diagnosis after-
ward, but its inspection cost is relatively high, the airspace authorization application
period is long, and the data processing volume is large, which is not suitable for
special inspections of regional safety distance black spots. Its operation is difficult,
and the requirements for qualification acquisition are high. The high value of the
equipment brings psychological and technical pressure to the operators, it is diffi-
cult to be widely used among the line patrol personnel, and it cannot fully meet the
requirements of the routinized inspection of power grid transmission lines.
The development of micro air vehicle (MAV) technology provides a new means
for power grid inspection techniques. The equipment is inexpensive and easy to
control, and it has the advantages of easy acquisition of qualifications, low flight
altitude, low legal risk, and high acceptance of front-line personnel.
Based on the automatic detection of powerline safety distance by small-size
LiDAR, a solution to the automatic extraction of powerlines and their surrounding
objects from airborne LiDAR scanning data is proposed. Therefore, the distance
between the transmission line and its surrounding objects can be determined, and
then the elevation information of the transmission line and its surrounding envi-
ronment can be diagnosed according to the distance. The diagnostic mechanism can
determine whether objects, such as buildings or vegetation, near the line have reached
a dangerous height or position so that the power supply bureau can trim or remove
them in time to ensure the safe operation of the line. With the development of deep
350

Fig. 5.13 Overall technical route of LiDAR point cloud acquisition and DEM update
5 UAV 3D Measurement
5.2 LiDAR 3D Measurement 351

Fig. 5.14 The technical


solution of quality
improvement for point cloud
holes and sparse LiDAR
ground points

learning, it has achieved good results in natural language processing, text processing,
image recognition, and other fields [22]. The deep learning method is used to clas-
sify and process LiDAR point cloud data, which overcomes the shortcomings of low
efficiency and poor accuracy of existing point cloud data classification methods.
The solution of airborne LiDAR powerline inspection mainly consists of three
parts: powerline fitting, transmission tower extraction, and obstacle safety diagnosis.
The powerline fitting algorithm is mainly divided into four categories, namely,
rough extraction of powerline point cloud data, automatic fitting of the projected
powerline based on the projection, reverse projection of the projected powerline and
fine extraction of powerline points, and powerline vector generation based on the
region segmentation of accurate point cloud data. The flow chart of the powerline
fitting is shown in Fig. 5.15.
The coarse extraction of the power line point cloud data. Since the elevation infor-
mation of the power line is quite different from that of other objects in the same range,
the elevation threshold segmentation method is selected to preliminarily distinguish
the point cloud data of the power line from other background data to realize the coarse
extraction of the power line data. There are many types of segmentation methods
based on elevation, and this scheme uses the iterative threshold segmentation method.
Figure 5.16 is a set of experimental results of coarse segmentation.
The automatic fitting of the projected power line is based on the discrete points
of the power line point clouds, and the fitted power line is output in continuous
points. This part mainly adopts the relevant theories of resampling, edge detection,
line detection, and line fitting in digital image processing as support.
352 5 UAV 3D Measurement

Fig. 5.15 The process of power line fitting

Fig. 5.16 The coarse extraction result of the power line point clouds
5.2 LiDAR 3D Measurement 353

For further processing, it is proposed that the point cloud data can be gridded to
generate an elevation image, and the indirect processing of point cloud data can be
carried out by means of digital image processing. In the generated elevation image,
the power lines are presented as edges due to the prominent elevation. The Canny
operator with better edge extraction can be used to extract and acquire the point cloud
information of the power line. Alternatively, various template operators with slightly
smaller calculation amounts can be used to reduce the computational cost, such as
Laplace operators.
For the case where multiple power lines are in the same image, it is necessary to
cluster the power line point cloud to determine the specific point clouds involved in the
fitting of a specific power line. The clustering of extraction lines can be completed
by the Hough transform or the maximum likelihood method. For the point cloud
data where a line cluster is located, the least squares can be used to achieve line
fitting. Considering the extension and convexity of the power line between the towers,
the segmented least squares method can be used to fit the straight line to realize
the approximation and simulation of the power line as a curve in real situation.
Figure 5.17 is the binarization result of the projection line extraction by Hough
transform straight line detection and using the least squares method to fit the straight
line.
The reverse projection of the powerline projection line and the precision extraction
of power line point clouds mainly complete the reverse projection of 2D straight-line
segments (power line vector coarse extraction) and the fine extraction of the power
line point clouds. This part is mainly supported by the relevant theories of projection
and spatial clustering. The power line vector described in 3D can be obtained by
reverse projecting the extracted 2D projected straight line segment to 3D space. To
further improve the extraction accuracy, the extraction result is used as the initial
clustering kernel to perform distance clustering on the spatial point cloud to obtain
accurate power line point cloud data for generating power line vectors.

Fig. 5.17 The least squares fitting results of the projection lines
354 5 UAV 3D Measurement

Fig. 5.18 The experimental results of the automatic extraction of the power line

Power line vector generation is mainly based on accurate area segmentation.


This part is primarily supported by the relevant theories of region segmentation in
linear programming and the centroid and the mass center of objects. The accurately
extracted power line points are divided into regions, and the region centroid is used
as the vector node to output polylines of the power line. Figure 5.18 shows the
experimental results of a set of accurate power line extractions.
The specific process of electric tower extraction is shown in Fig. 5.19. The line
segment corresponding to the electric power line on the elevation image is generally
only a few pixels wide, while the electric tower and potentially dangerous objects are
the clustering point group. The erosion operation in mathematical morphology has
the effect of eliminating the boundary of objects. Therefore, the power lines can be
eliminated by the erosion operation first, and then the connected area can be obtained
by the dilation operation. The point group of each connected area can be obtained
by x-mean clustering, and finally, the electric tower can be distinguished from the
potentially dangerous objects by the size of the area.
The main principle of obstacle safety diagnosis is that for safety distance anal-
ysis, the design combines the automatically extracted power line and DSM data,
calculates the distance between the line and the DSM, compares it with the stan-
dard safety distance, and conducts hazard warning for areas with a distance less than
the safety distance. For the line radian, the line sag analysis, and the stress of the
line tower analysis, the design uses the automatically extracted power line vector
data to calculate the line radian and sag of the corresponding line. According to the
5.2 LiDAR 3D Measurement 355

Fig. 5.19 The extraction process of the electric tower

corresponding mechanical principles, the inverse of stress on the power line tower
provides early warning of serious safety hazards, such as the collapsed tower. Each
fault detection module provides information such as the hazard level of the entire
line and the coordinates of the hazardous line segment in the form of a report to assist
decision-making. The data processing flow is shown in Figs. 5.20 and 5.21.
3. Urban change monitoring
A city is an area with a high concentration of human activities, and it is also an area
with a high concentration of information and materials such as population, housing,
and events. The refined governance of the city requires comprehensive monitoring
and management of various elements in the city. The application of urban manage-
ment concepts and technologies such as smart cities, geographic information systems,
and grid management has satisfied the needs of urban refined governance to a certain
extent. However, with the rapid development of the current city, the previous platform
and local urban dynamic monitoring methods can no longer meet the increasingly
sophisticated urban fine monitoring needs. At the same time, with the development
of computers, optical technology, and geographic information technology, the inno-
vative method of digital city management in the form of panoramic images, oblique
3D models, LiDAR point clouds, and other real scenes has gradually come to the fore
becoming a necessary part of smart city construction, urban dynamic monitoring and
management. Many cities in China have taken these real scene data as one of the
basic data of smart cities. With the establishment of continuous live images of the
city based on the 3D model of the city’s real scene, panoramic images, and LiDAR
356 5 UAV 3D Measurement

Fig. 5.20 The diagnosis process of the obstacle safety distance

Fig. 5.21 The diagnosis


process of the line sag and
stress analysis
5.2 LiDAR 3D Measurement 357

point clouds, traditional plane simulated-style city management has been upgraded
to the 3D management of real scene visualization, and the level of urban refined
governance has been effectively improved.
The UAV panoramic monitoring method compares the panoramic images regu-
larly captured by the UAV with the panoramic data generated in the past to identify the
urban changing areas in time and to provide detailed panoramic image information
of time series and change for urban dynamic monitoring. Based on the panoramic
cloud platform for management, it can query and manage information such as time,
location, and map polygon changes and form a unified spatial–temporal reference
for rapid monitoring of urban changes. This approach is fast, has a smaller workload,
and is less time-consuming, but it cannot obtain the results through real-time analysis
and cannot perform measurements to extract more information about changed areas.
Traditional illegal building monitoring methods based on the spectral features or
texture features of remote sensing images ignore the 3D features of the changing
area. As a result, the current automatic monitoring methods for changing areas based
on UAV aerial images cannot achieve ideal results. The accuracy of the existing
object-based 3D change detection technology is 50–60%, and the computing speed
is approximately 2 h/km2 , which cannot achieve high-precision and rapid 3D change
detection of urban buildings. The use of a UAV airborne LiDAR scanning system
for urban change monitoring is based on high-precision 3D laser scanning and real-
time high-precision mapping and generates real-time high-precision point cloud data
within the surveying area. Combined with the previous high-precision 3D data of
cities, it is connected to the intelligent system to extract and automatically identify
urban changing areas in real-time, as well as real-time accurate measurement, and
achieves automatic, high-precision, real-time dynamic monitoring of urban changes.
Taking the project as an example, in the general inventory of building information
census and 3D intelligent inspection, LiDAR point cloud technology and oblique
photography technology are introduced to realize real-time and high-precision auto-
matic dynamic monitoring of illegal urban buildings in the entire jurisdiction. It
solves the problems of inefficiency in traditional manual inspection, such as diffi-
culty in full coverage, working blind spots and difficulty in discovery, investigation,
and evidence collection. It has been highly recognized by customers. The flow chart
of LiDAR technology is shown in Fig. 5.22.
1) Rapid 3D spatial information acquisition based on LiDAR and oblique photog-
raphy integrated system
Airborne LiDAR technology actively transmits a high-frequency laser to the target
through the laser scanner, receives the reflected laser, and records the timing. The
distance between the laser scanner and the target is calculated by the time between
the emission of the laser and the reception of the laser. Combined with the platform
position and attitude data obtained by the POS system, it can calculate the 3D coor-
dinates of the target (3D LiDAR point cloud). The data acquisition speed is fast with
a large range, but there is no texture information, and the visualization effect is poor.
Oblique photography technology is composed of multiple cameras with different
directions and angles and the corresponding synchronous exposure structure. The
358 5 UAV 3D Measurement

Fig. 5.22 LiDAR real-time change monitoring process

multiple images in the vertical direction and oblique direction can be obtained at
the same time by one exposure of the camera, and the 3D real scene model can be
created through the post-process of aerial triangulation. The data acquisition speed
is fast, the acquisition range is large, the sampling rate is high, and the occlusion by
ground objects is reduced.
In terms of hardware integration, the system integrates LiDAR and oblique
photography to achieve data synchronization and rapid acquisition according to the
principle of synchronous control. Through one flight data acquisition, 3D LiDAR
point cloud data and multi-view oblique image data can be acquired simultaneously.
2) Fast 3D reconstruction integrating LiDAR point clouds and oblique images
By fusing multi-source airborne LiDAR point cloud data and oblique photography
data [23], multi-source data results can be produced automatically and efficiently. It
can build 3D models, DEM, orthophotos, and other digital city 3D geographic infor-
mation products. The processing ability of LiDAR data and image data to complement
each other and coordinate computing is achieved.
The software system can obtain the space vector data, integrate the oblique image,
automatically correct the model vector, and generate a high-precision building 3D
model by analyzing the structural composition and topological relationship of the
building model through the airborne LiDAR point cloud. Relying on powerful
multi-source data analysis and processing capabilities, the oblique photographic
texture is automatically and accurately textured through texture mapping, which
can reconstruct the real scene 3D of architectural models accurately and quickly.
3) Real-time change monitoring based on LiDAR and oblique photography multi-
source data
Real-time change monitoring technology is based on multi-source data from LiDAR
and oblique photography [24, 25], which relies on a UAV 3D laser scanning system to
collect urban 3D point cloud data and uses a wireless network to transmit the data to
5.2 LiDAR 3D Measurement 359

the backend platform of the control center. Based on the existing 3D model database
of real urban scenes, the laser scanning data sent back by the UAV are processed
to generate a 3D surface model in real-time under the same geographic framework.
By analyzing the features of the model and comparing them with the base map
in real time, the building changes in the scanning area are automatically analyzed,
and the urban changing area is displayed in real-time by means of highlighting.
The real-time video is collected by the UAV in the meantime to assist the research
and analysis of the changes, the location, attributes, and other information of the
changing spots are displayed, so the rapid detection of urban changes is achieved.
The key technology is to integrate 3D laser scanners, GPS, INS, synchronous control
equipment, digital radios, and other modules. Through the unified operation of built-
in software and hardware, the data are computed and transmitted according to the
set logical sequence and finally output in the relevant format, which provides an
active, convenient, fast, and accurate all-day data collection tool for the UAV airborne
LiDAR system. The system platform is highly integrated with compatible data output
display and data analysis. Applications such as dynamic monitoring, querying, and
statistical analysis are completed. The geographic location of urban changing areas
and related attribute information are quickly provided in real-time. Data support is
provided for the applications of relevant departments such as urban management and
housing construction, and the productivity of the functional government agencies can
be improved.
LiDAR technology can achieve real-time 3D scanning and change analysis of
cities, and dozens of square kilometers can be covered at a time. Then, through
the rapid processing of point cloud data, LiDAR can automatically and intelligently
identify urban change information. At the same time, the artificial intelligence engine
can be introduced so that the accuracy of urban change detection can be improved.
An example of urban building height change analysis is shown in Fig. 5.23.

Fig. 5.23 Schematic diagram of real-time change monitoring and LiDAR analysis
360 5 UAV 3D Measurement

Fig. 5.24 Verification area and data type of terrain feature classification method

The area of a boardroom at the construction site near the Xili campus of Shenzhen
University was selected for the ground object classification experiment. However,
there are optical images and other data in this area. We did not take into account the
features from the top, as we aimed to test the efficiency of pixel comparison features
extracted from LiDAR data. Only the LiDAR data were selected as the input. The
original point cloud was interpolated to a gridded image with a resolution of 20 cm,
which preserved enough details (the point density of the original point cloud was
about 10.8 points/m2 ), as shown in Figs. 5.24, 5.25 and 5.26.
Details of the processing procedure are displayed in Fig. 5.27. Pixel comparison
features were computed using a window of size 20 × 20. We set the number of features
to 100. We adopted Breiman and Cutler’s random forest classifier for classification,
which was implemented in MATLAB (https://github.com/jrderuiter/randomforest-
MATLAB/tree/master/RF_Reg_C). Three hundred trees were built in our random
forest, and 10 variables were randomly sampled as candidates at each split. We
sampled the cases with replacement. We set the minimum size of the terminal nodes
to 1 for an unlimited maximum tree depth. The random forest was trained using
2000 samples per class that were randomly distributed. In the post-processing of
classification, the size of the kernel in the majority analysis was set to 8. The weight
of the center pixel was the same as that of the other pixels in the kernel window.
The confusion matrix was computed using ground truth provided by the ISPRS Test
Project to validate the classification results. The bottom row and the far-right column
of the confusion matrix give the recall and precision for each class. The classification
map and the confusion matrix computed by our method are shown in Fig. 5.27.
It took 0.45 s to extract features, 36.98 s for training, and 3.71 s for classification.
The code was executed using MATLAB without parallel computing on a computer
with Windows 10. The CPU of the computer is an Intel i7-6700 (3.4 GHz quad-core).
The computer has 16 GB of memory space. Our method is fast to compute feature
descriptors and for classification. The main computational burden lies in the training
process. The overall accuracy achieved by our method is 87.2%. The performance of
our method is good enough for two reasons: first, only height-based information is
adopted; second, area 1 is the most challenging case for the test areas. The proposed
method classifies the ground and tree, which are the two dominant classes. The recalls
5.2 LiDAR 3D Measurement 361

Fig. 5.25 Classification map obtained using the proposed method

for these two classes are 89.2% and 96.1%, respectively. The precisions for them are
96.2% and 96.0%, respectively.
However, the method is not efficient in discriminating cars from the ground and
trees. Many pixels are misclassified as cars, which is shown by the low precision
score (49.8%). Several factors may lead to this effect. First, due to the discontinuities
of ground pixels between cars, the class they are classified as tends to be the same
as surrounding pixels in the process of majority analysis. Second, as only the height
information is used while the cars and ground have minor height differences, it is
difficult to distinguish between them. Third, the small size of cars and the absence
of some cars in the gridded image may lead to difficulty in training. The same
issue has been reported in other studies [26]. Many details are also missing for
fully convolutional neural networks (FCNs) [27]. Cars were misclassified as low
vegetation in the category definition [28].
Except for the inefficiency in the classification of cars, discrimination between
trees and ground may also pose a challenge. Many pixels are misclassified as trees
despite the completeness of the tree class, while most of them belong to the ground
class. This happens frequently when many trees surround an isolated area of ground.
The situation is similar to the difficulty in discriminating ground pixels between cars.
362 5 UAV 3D Measurement

The difficulty in distinguishing these pixels from their surroundings may be caused
by the filtering process or insufficient point density. The transition from one object
to another poses another challenge. Due to the low resolution of LiDAR data, the
boundaries between objects in the DSM, which are derived from LiDAR data, are not
as clear as those in optical images. Figure 5.26 shows detailed classification errors
compared with ground truth.
In the method, a postprocessing step named majority analysis (MA) is applied
to smooth isolated pixels; by doing this, the overall accuracy (OA) is increased
from 91.8 to 93.1%. The minor improvements made by the postprocessing (1.3%)
imply that our method can preserve the contextual information well. However, this
step can still be used to refine classification results. Isolated pixels inside objects
are smoothed after the majority of analysis. Borders between objects become more
natural and closer to the actual ones after filtering, as shown in Fig. 5.27. However,
isolated ground pixels may be incorrectly smoothed by the majority of the analysis.

Fig. 5.26 Classification errors-loss of boundary details of objects

Fig. 5.27 Classification accuracy increase after postprocessing


5.3 Optimized Views Photogrammetry 363

5.3 Optimized Views Photogrammetry

Optimized views photogrammetry is a photogrammetry technology based on the


UAV platform. Its main technical applications include real-scene 3D modeling, close-
range photogrammetry, etc. 3D reconstruction of space scenes and high-precision 3D
measurement of the marks could be realized via optimized UAV aerial image data
processing.
Different from conventional technical modes such as traditional aerial photogram-
metry and oblique photogrammetry, optimized views photogrammetry ensures high-
quality 3D measurement applications from the data acquisition end with a closer
viewing distance, richer observation angles, and more automated control processing.
The technical principle of optimized views photogrammetry [29] can be defined as
follows. The 3D rough model generated in a variety of ways is obtained as the basis for
planning. Then, the aerial view can be generated and optimized by stereo observation
sampling and observability constraint analysis, and finally, a UAV photogrammetry
planning scheme is made in the form of an aerial flight path. The advantages of
optimized views photogrammetry exist in two aspects: one is the refined analysis
and planning before implementation of the measurement; the other is the automatic
processing and operation ability.
Following the development process of UAV photogrammetry, optimized view
photogrammetry is consistent with traditional aerial photogrammetry and oblique
photogrammetry, but it also has obvious differences. The route planning of traditional
aerial photogrammetry is mainly to produce 2D geographic information data, and its
technical approach is not sufficient to obtain a comprehensive and complete spatial
3D measurement. Additionally, oblique photogrammetry has inherited this approach
to a large extent.
In the practice of oblique photogrammetry, the typical configuration of the sensor
system is a five-lens camera, as shown in Fig. 5.28. The left is a Rainpoo DG3
PSDK five-lens oblique photography camera, and the right shows this sensor system
supporting the imaging scheme of the vertical angle and the four oblique views of
front, rear, left, and right.
Although only a few oblique perspectives have been added, the purpose is to
acquire 3D geospatial information data, which has brought qualitative changes to
the “old-fashioned” technical method of aerial photogrammetry.
Figure 5.29 compares the difference between vertical photogrammetry and
oblique photogrammetry. The left side of Fig. 5.29 shows the stereoscopic view
of vertical photogrammetry. For point S on the side elevation of the building, there
is only an image point S2 in the right view, so in the vertical photogrammetry mode,
points such as point S cannot be measured. The right panel of Fig. 5.29 illustrates
oblique photogrammetry. Point S on the side elevation of the building can be projected
to image points S2 and Sb in the oblique view. Then, the vertical view C2 and the
oblique view Cb constitute a stereoscopic observation. Therefore, point S can be
measured in oblique photogrammetry mode.
364 5 UAV 3D Measurement

Fig. 5.28 The camera with five-lens and imaging scheme

Fig. 5.29 Comparison between vertical photogrammetry and oblique photogrammetry

Taking the real scene 3D model as the main output result, the emergence of
oblique photogrammetry has made aerial photogrammetry a crucial step toward the
acquisition of true 3D data [30]. However, its inheritance of the traditional aerial
survey acquisition mode limits the maximum level of technological upgrades.
The left side of Fig. 5.30 is a real scene 3D model of an ordinary high-rise building.
The upper right and lower right are two details on the model, where there is obvious
distortion in the part of the balcony protruding from the floor in the upper right picture.
The reason is that downward imaging causes the occluded portion to become a dead
angle for taking images. The oblique angle of view in oblique photogrammetry does
not improve this situation. The lower right picture shows the part of the model near the
ground. This part is not only distorted, but also the image texture is blurred, indicating
that the corresponding area has a small number of images and low resolution for
modeling. This is more common for objects or scenes with large height differences.
5.3 Optimized Views Photogrammetry 365

Fig. 5.30 Real scene 3D model of oblique photogrammetry and its common issues

The aerial acquisition mode used in oblique photogrammetry and even traditional
aerial photogrammetry is subject to fixed-wing manned or UAV-based flight plat-
forms. Compared with fixed-wing manned or unmanned aerial vehicles, although
the flight efficiency of multi-rotor UAVs is lower, they have a higher degree of auto-
matic control, better safety and reliability, and especially greater maneuverability.
At the same time, combined with the light and miniaturization of multi-rotor UAVs
and the high-precision positioning capabilities of airborne RTK, airspace can be the
greatest possible use. In addition, another difference from the fixed-wing UAV aerial
photography system is that the multi-rotor UAV mounts the camera through a multi-
axis stabilized gimbal, in which the angle of the camera’s imaging attitude can be
adjusted by controlling the gimbal.
The left picture of Fig. 5.31 shows the DJI M300RTK quadcopter. One of its main
technical features is the RTK antenna configured on the rear arms, so this kind of
UAV can reach less than 2 cm positioning accuracy. The picture on the right shows
the DJI Zenmuse-P1 aerial camera. The imaging system is equipped with a gimbal,
which can adjust the camera angle in three axes.
The actual capabilities of UAV platforms and imaging systems provide the premise
for the innovation of aerial photogrammetry. Based on a multi-rotor UAV with high
positioning accuracy and an aerial camera adjusted by a multi-axis gimbal, this
combination can implement aerial photography at any position and from any view,
except for the space occupied by the entity. It breaks through the constraints of the
inherent aerial photography mode, making it possible to fully cover the reality scene.
366 5 UAV 3D Measurement

Fig. 5.31 Multirotor drone and aerial camera with gimbal

5.3.1 View Optimization and Route Generation Method


Based on the Rough Model

The prior information based on traditional aerial photogrammetry and oblique


photogrammetry is mainly a 2D orthophoto. Aerial photography planning is carried
out based on the orthophoto image of the surveying area, and all technical parameters
are “normalized” according to the overall situation of the surveying area. The actual
aerial photography method is more like a “brute-force” blind capture, which makes
it difficult to control the quality of each image. It needs to be checked and evaluated
in the data post-processing stage and creates considerable data redundancy.
With the introduction of MAVs, it has been realized that 2D prior information
is insufficient in emerging aerial photogrammetry methods and has turned to aerial
photography planning based on 3D information. A highly automated UAV aerial
photography data collection technology roadmap is gradually derived. According to
the specific objects of the scene, the gradually refined strategy is adopted to generate
rough three-dimensional information of the scene or objects and further compute and
plan the 3D flight path corresponding to the scene objects [12, 22, 29].
Similar to the above technical methods, optimized view photogrammetry is also
based on 3D prior information. Subsequent visibility and measurability analysis, as
well as optimal selection of view and flight line generation, all need to be carried out
around this prior information [30–33]. In optimized view photogrammetry, the 3D
prior information is called a coarse model.
1. Acquisition and generation of the rough model
The rough model is the basic geometric representation of the space scene or object,
and it is also the foundation for the implementation of the subsequent technology
of optimized views photogrammetry. For the purpose of aerial photography plan-
ning, the methods of obtaining or generating a rough model can be divided into the
following types:
5.3 Optimized Views Photogrammetry 367

Fig. 5.32 Rough model of urban scene

1) Pre-flight reconstruction generation


A small amount of image data that is sufficient to build a rough model can be collected
by the pre-flight of the UAV, and then the image data can be used to generate a rough
model through the 3D reconstruction approach [29].
2) Basic geographic information data conversion and generation
The original high-resolution digital surface model can be directly converted into a
rough model, or a 2.5D rough model can be integrated through the bottom contour
of the ground object and the height of the ground object estimated by the shadow
based on the satellite remote sensing orthophoto [29].
3) Design model conversion and import
A single building or a large-scale engineering industrial structure usually has a corre-
sponding design model, such as its BIM. The geometry of the model corresponds
exactly to the entity. What needs to be done is to convert the design model into the
scale and orientation of the actual space and import it into processing.
In practice, the rough model for the application requirements of 3D reconstruction
of spatial scenes is usually obtained by the first method of pre-flight acquisition and
reconstruction. The generation time of the rough model can be compressed as much
as possible by experience and skills during the operation.
Based on the image data collected in the pre-flight, two kinds of rough models
can be output through 3D reconstruction, which are called the objects model and the
layering 2.5D model. Figure 5.32 shows two rough models of urban local scenes. The
left picture is the object model, and the right picture is the 2.5D model. The object
model is directly reconstructed and generated from the image data collected from the
pre-flight. Due to the use of less data, many of the observation angles are missing,
and the deformation and loss on the model are also more obvious. The 2.5D model is
generated using the dense point cloud output from aerial triangulation and is layered
according to a specific layer height. The local convex hull (profile) is determined
from the point cloud within each layer, the 2.5D model of the layer is formed by
combining the layer height, and the rough model is obtained after superimposition
layer by layer.
The two rough models are compared according to the generation results of the
flight route plan and the final scene reconstruction results. The actual application
368 5 UAV 3D Measurement

effect shows that although the structure of the object model is not complete and full
compared to the 2.5D model, it can better reflect the details of the ground objects,
especially the special-shaped buildings, thereby improving the targeted planning in
the local scene. In view of the generation method of the 2.5D model, the hollow or
skeleton structure of the building in the real space is difficult to express. Therefore,
the object model is often used as a rough model for planning reference in technical
practices.
Design models are often more elaborate than the rough models described above
and appear as complex monolithic structures. Figure 5.33 shows an example of a
design model of a radio telescope antenna. The main problem with taking the design
model as the basis for aerial photography planning is that the design model does
not have absolute geospatial coordinates, and its scale in the physical space also
needs to be accurately calibrated. The method to solve the problem is to establish the
coordinate transformation relationship through the control points corresponding to
the structure entity on the model to realize the positioning, orientation, and scaling
of the design model.
2. Preliminary selection of views based on model observation sampling and
visibility analysis
UAVs are used to surround the physical space for aerial photography, and the objects
and buildings in the scene may become obstacles on the flight route, thus posing a
threat to flight safety. In addition, due to the random influence of the environment
or other factors, the control of the UAV’s position and attitude cannot always be
kept accurate and stable, so there is a risk of collision if it is too close to the object.
In view of this, according to the set safety distance, a no-fly zone is established by
dilating the rough model in the horizontal and vertical directions to ensure flight
safety. Figure 5.34 shows the rough model of the urban scene (left) and the no-fly
zone generated by it (right). Compared with the rough model, the no-fly zone has
obvious outward expansion, and through this setting, more regular flying airspace is
divided.

Fig. 5.33 The design model


of industrial structure
5.3 Optimized Views Photogrammetry 369

Fig. 5.34 Rough model of urban scene and generation of no-fly zones

The rough model generated by the methods is a mesh model representing the
scene space or entity structure. Even if the rough model can only roughly represent
the scene or entity, it is also the closest surface model to the real scene compared
to a 2D plane or a conventional DSM. As a result, the flight route design based
on ground-related configurations in aerial photogrammetry, including flight height,
ground resolution, and overlap ratio, will be turned to plan the best UAV aerial
photography views according to this surface.
Adopting the strategy of breaking the whole into pieces, dividing the grid model
into small local units, and making specific configurations for each unit to generate the
corresponding aerial views. The specific division operation adopts the constrained
Poisson-disk sampling method [34]. As a kind of importance sampling, the result
of this kind of sampling can better reflect the salient geometric features of the
model surface, and the sampling density can be controlled by setting the disk
radius Ddisk . Using the sampling method,  the mesh surface is transformed into a
set S = (s1 , n1 ) (s1 , n2 ) · · · (sm , nm ) consisting of the sampling point si and the
surface normal vector ni of its corresponding position.
Based on these sampling points, the UAV aerial photography views are prelim-
inarily determined. The local part of the model represented by the sampling point
is captured via normal photography. The advantage of this processing is that each
sampling point si can be used as the nadir point of the viewing, then − ni , that is, the
main optical axis direction. Figure 5.35 shows the basic principle of determining the
stereoscopic view based on the sampling point and its normal vector using normal
photography. In the figure, s1 and s2 are the sampling points, and n1 and n2 are the
corresponding normal vectors. The opposite direction is defined as the direction of
the main optical axis of photography opi , that is, opi = − ni , and c1 and c2 are the
locations of the camera, which is the exposure station. The object distance between
the camera station points ci of the aerial camera and the nadir point si on the model
is defined as dob .
If the sampling point si and normal vector ni on the model surface are known,
ci = si + ni · dob , then the view after the camera is positioned and oriented is
defined as vi = ci , opi , corresponding to the sampling set, and there is a set of
views V = [v1 v2 · · · vn ]. According to the stereoscopic observation relationship
established by normal photography, the field of view of the camera is known, which
is expressed as the field of view ϕ FOV determined by the focal length and the size
370 5 UAV 3D Measurement

Fig. 5.35 Analysis of stereoscopic observation based on sampling points

of the sensor. After the field of view is determined, the object distance determines
the ground resolution of the image. The spacing between adjacent sampling points
that form a stereoscopic observation is both the baseline width B and the sampling
disk radius Ddisk . The baseline width is related to the image overlap rate Or . The
projection width W prj is defined as Or = (Wprj − B)/ Wprj . Therefore, if the overlap
ratio is given, the radius of the Poisson sampling disk can be determined as:

Ddisk = Wprj (1 − Or ) (5.4)

Therefore, the sampling density can be controlled according to the preset overlap
ratio. Figure 5.36 shows the results of sampling observations on a rough model of an
urban scene. The left and right figures show the sampling results from the side and
top views. The black points on the model in the figure are the sampling points, and
the points are sparsely distributed with the geometry of the model.
Following the settings of the overlap ratio and ground resolution, the views deter-
mined by the sampling point according to normal photography may be blocked or
located in the ground object or the no-fly zone. Therefore, it is necessary to analyze
the visibility of the view and make appropriate adjustments.
Figure 5.37 shows the visibility analysis of the observation views. If the view
cannot be placed or is blocked, rotate the view in a specific direction until the view
leaves the occupied space area or is no longer blocked while keeping the sampling
5.3 Optimized Views Photogrammetry 371

Fig. 5.36 Urban scene rough model observation sampling

point and the object distance unchanged. If the angle of rotation of the views is too
large, even though the direction is close to vertical or perpendicular to the original
normal direction, it is still impossible to avoid collision or have effective visibility
and then delete the views.
Figure 5.38 shows the initial selection of the views generated based on the
sampling results in Fig. 5.37. The blue dots in the figure are the camera positions,
and the green lines with arrows indicate the perspective direction. For the region
where the sampling points are distributed, a total of 5709 views are generated. Note

Fig. 5.37 Visibility analysis of observation views


372 5 UAV 3D Measurement

Fig. 5.38 The generation results of the primary selection of the views of the urban scene

that some of the view vectors toward the facades of the buildings in the model are
deflected upward, indicating that they are adjusted according to the visibility analysis.
3. Optimal selection of views based on measurability analysis
The primary selection of the views generated through the above process forms a
dense set of views, and the larger the overlap ratio setting is, the more views are
generated.
The views optimization of the optimized views photogrammetry is equivalent
to pre-positioning some key techniques in the post-processing of 3D reconstruction,
combining the prior information of the rough model and the data collection capability
of the UAV, to provide support for the final output results in the data acquisition stage.
In the processing of real scene 3D-related technologies, image data are mainly
used for 3D reconstruction of the scene. Therefore, the quality of the perspective is
usually judged by the quality of the reconstruction or the value of the image that
contributes to a good reconstruction. This defines the reconstructability [12, 22]
criterion.
Figure 5.39 shows the relationship of parameters related to reconstructability
in binocular stereoscopic observation. The meanings of the sampling point s and
its corresponding normal vector n are the same as those described above, and a
stereoscopic observation is formed between the views v1 and v2 represented by the
camera stations c1 and c2 and the sampling points s, where α is the intersection
angle between the rays c1 s and c2 s, γ1 and γ2 are the angles between c1 s, c2 s and
n, respectively, and γmax = max(γ1 , γ2 ), dmax = max(c1 s, c2 s). Then, the
reconstruction of this stereo observation is expressed as:

q(s, v1 , v2 ) = w1 (α) w2 (dmax ) w3 (α) cos γmax (5.5)


5.3 Optimized Views Photogrammetry 373

Fig. 5.39 Parameter


relationship of stereoscopic
reconstructability

where w∗ (·) is the weight function, which is defined as:

w1 (α) = {1 + exp[−k1 (α − α1 )]}−1 (5.6)


 
dmax
w2 (dmax ) = 1 − min ,1 (5.7)
dep

w3 (α) = 1 − {1 + exp[−k3 (α − α3 )]}−1 (5.8)

where α 1 , α 3 and k 1 , k 3 are empirical parameters. w1 (α) describes the relationship


between the intersection angle α and the triangulation error. According to the corre-
lation analysis of photogrammetry [29], the larger the intersection angle is, the longer
the baseline between the camera stations and the smaller the measurement error in
the depth direction. w3 (α) is also related to the intersection angle α, but it has the
opposite effect to w1 (α). The smaller the intersection angle is, the greater the overlap
ratio and the more valid the matching between images, which is more conducive
to accurately completing the relative orientation between perspectives. w2 (dmax ) is
related to the distance from the views to the sampling point, in which d eb is a preset
parameter and can be set to several times d ob . The farther the views are from the
sampling point, the lower the ground resolution and the worse the measurement
accuracy. cos γmax reflects the deflection of the viewing angle relative to the normal
vector. The larger the deflection angle is, the worse the measurement accuracy.
If the q(s, vj , vk ) of any two views v j and vk are further combined with the judgment
of the visibility of the sampling point in the views, the observability is defined as
374 5 UAV 3D Measurement


n 
n
   
m(s, U) = δ s, v j δ(s, vk )q s, v j , vk (5.9)
j=1 k=1

 
Among them, δ s, v j is a binary discriminant function. To identify whether vj
is visible to s, it can be regarded as 1 but not as 0. Since the problem of being
occluded has been identified when the initial selected view is generated, the visibility
referred to in Eq. (5.9) is mainly reflected in γj ; when γj > γthreshold , the visibility
discriminant function is 0. m(s, U) contains the observability of the view set U
with respect to s. Therefore, the camera forms a multi-baseline photogrammetric
forward intersection[35], and the more effective perspectives, the more redundant
observations, and the higher the measurement accuracy will be. m(s, U ) can fully
represent the above meaning.
Based on the m(s, U ) function, for all views V and all sampling points S, the view
optimization is transformed into an optimization of this function, and the optimization
objective function is defined as [29]


n
U ∗ = arg min λ| P| + (max(m max − m(si , U), 0))2 , si ∈ S\ P, n = |S\ P|
U
i=1
(5.10)

where U is the selected views subset and P is the unobserved sampling point set
in the view’s subset in S. The purpose of this objective function is to ensure the
complete coverage of observations for all sampling points by minimizing | P|. In
addition, according to the threshold mmax set for the quantization of the observability
function, it is ensured that the increase in the view is suppressed after m(s, U ) reaches
the optimal value.
Another objective function [29] defines the concept of redundancy to reflect the
importance of perspective. The redundancy or the function called view importance
can be expressed as

r (v∗ ) = min{m(s, V )}|s ∈ S, δ(s, v∗ )} (5.11)

The view importance indicates that v∗ provides a relatively larger contribution


to the sampling points s with lower observability. The final objective function is to
minimize redundancy while maximizing observability, expressed as
⎛ ⎛ ⎞⎞

n 
f
 
U ∗ = arg max ⎝ m ⎝ si , arg min r v j ⎠⎠, si ∈ S,
T ∈V
U⊂V i=1 j=1
|U| = |T |
n = |S|, v j ∈ T , f = |T | (5.12)
5.3 Optimized Views Photogrammetry 375

Since the maximization and minimization operations are involved, the optimiza-
tion process of the above objective function is carried out in two steps. Sort according
to the perspective importance function and delete the view iteratively from high to
low corresponding to the value of r (v∗ ). Each deletion of a view requires checking
whether the observability of each sampling point is still greater than the threshold.
The minimum view subset T that can be obtained by deleting views is not the optimal
subset, so on this basis, combined with the initially selected view set V, the view set
with the best observability is further generated.
Then, for each view vt in T, within its adjacent range, select similar views from
the initial selection view set V according to the similarity measure and construct an
alternative approximation set C s (vt ), and the similarity measure is mainly based on
view pose, defined as
opb · opa
f (vb , va ) = (5.13)
cb ct  + ε

where opb , opa , cb , ca are the viewing angle directions and camera positions, respec-
tively, the numerator part of f (·) represents the similarity of the viewing direction,
the denominator part represents the closeness of the view position, and ε is a tiny
constant.
If vt is replaced by the view in C s (vt ), the overall observability of all sampling
points is improved, and at the same time, it detects whether the observability of
the sampling point is greater than the threshold until all views in T are tested and
constitute a new view Set U ∗ .
This process ensures the overall maximum observability of the sampling points
under the premise of the least view redundancy, that is, it is considered that optimal
view selection has been achieved. Figure 5.40 shows the optimized selection results
based on the primary and secondary selected perspectives. Compared with the 5709
initially selected views, the preferred number of views is 1601, which eliminates
many redundant views. Considering factors such as the endurance of the drone, the
selected views are further divided, and the division results are presented with points
rendered in different colors.
In addition, the optimization process of the view includes the analysis of the
observability of the sampling points. According to the situation reflected by the rough
model of the scene and the relevant conditional rules for the optimal generation of
view, the observability of each sampling point is different. In accordance with the
basic principle of multi-baseline photogrammetry, the more views of sampling points
that can be observed, the better the observability of the corresponding sampling point.
Figure 5.41 shows the results of the observability analysis of the sampling points.
The left picture shows the side view, and the right picture shows the top view.
Regarding the observability analysis, a piecewise qualitative representation on a
quantitative basis is used. On the premise of ensuring the effective observation of the
sampling points in the scene, the observability numerical benchmark m pre is set, and
on this basis, the numerical interval is divided into segments, and the observability
of the corresponding sampling point is marked. In practical applications, the interval
376 5 UAV 3D Measurement

Fig. 5.40 View optimization selection results of the urban scene

of the observability value is represented by a color: 0 ∼ m pre /2 is red, m pre /2 ∼ m pre


is yellow, m pre ∼ m pre + 10 is green, m pre + 10 ∼ m pre + 20 is light blue, and
> m pre + 20 is blue; the larger the value is, the better the observability. It can be
seen from Fig. 5.43 that most of the sampling points are blue, light blue, and green,
indicating that these sampling points have better observability, while some sampling
points in the building gaps have poor observability, which is also in line with the
basic logical judgment.

4. UAV aerial photography path generation based on optimized views

The view set U ∗ obtained by view optimization includes the exposure station position
and camera pose of each selected view. Corresponding to the actual acquisition
operation, it is required that the UAV can accurately reach each camera station and
adjust the camera to capture images according to the preset view direction. Therefore,
the aerial photography planning of the UAV needs to connect the exposure stations of

Fig. 5.41 Model sampling point observability analysis


5.3 Optimized Views Photogrammetry 377

each optimized view to form an aerial flight route and simultaneously set the camera
attitude angle of the exposure station.
Taking exposure stations as waypoints, the UAV’s route generation is summarized
as static path planning. It must go through all the camera stations in the optimized
view set U ∗ , so a fully connected graph with the camera stations as the vertex is
constructed, and the issue is further transformed into a traveling salesman problem.
Obstacles should be avoided during the graph connection process. At the same
time, the connection cost of the graph specifically refers to the power consump-
tion between the waypoints. In addition, the attitude adjustment of the aircraft and
the camera gimbal between adjacent waypoints also consumes energy. Therefore,
the cost function of connecting edges between any two views (v∗j , v∗k ) is defined as
⎛ ⎞
    θ
e v∗j , vk

= l c∗j , c∗k exp⎝  ⎠ (5.14)
∗ ∗
l c j , ck
 
where l c∗j , c∗k represents the shortest path length from c∗j to c∗j considering obsta-
   
 
cles, θ = arc cos op∗j op∗k / op∗j op∗k  is the angle between two views, and
    
the exponential term exp θ/l c∗j , c∗k in e v∗j , v∗k represents the cost of pose
adjustment.
Considering the large number of camera stations, which are distributed in 3D space
and must avoid obstacles, the genetic algorithm is adopted in the actual processing
to solve the flight route generation of TSP [36] to quickly improve the processing
time.
Figure 5.42 shows the route results generated based on the optimal selection of
the view in Fig. 5.40. In Fig. 5.40, the preferred views are divided into different
view subsets with different colors. The division is mainly based on the proximity of
the view positions and the approximation of the direction, and views with similar
poses are clustered together. Therefore, the actual route generation is for each subset
of views. At the same time, considering the endurance of the aircraft, it usually
corresponds to the number of waypoints that can be connected in series and then
combined with the division of the view set, and finally, the generation of each UAV
flight route is completed. Therefore, Fig. 5.42 shows each segment of the UAV route
around the observed scene in optimized views photogrammetry, rendered in different
colors to distinguish them.
In addition to dividing the view set by the pose approximation of the view, another
way is to divide it hierarchically by the height of the view. The left picture of Fig. 5.43
shows the result of dividing the optimized viewing angle set according to height,
which is divided into four layers according to the actual space height and represented
by different colors. The picture on the right shows the results of route planning based
on altitude stratification, also represented in different colors.
378 5 UAV 3D Measurement

Fig. 5.42 Optimized views photogrammetry flight route generation result for urban scene

Fig. 5.43 Hierarchical division of optimized perspectives and hierarchical route planning results

The aforementioned division of the optimized views and the flight route constitutes
the technical application basis for optimized view photogrammetry to support the
implementation of multi-UAV collaborative operations.

5.3.2 Accuracy Analysis for Fine Real Scene Modeling

For real scene 3D models, oblique photogrammetry technology only solves some of
the issues basically, but it cannot provide more practical and powerful support for
observability, which is the essential attribute of spatial geographic information data.
In contrast, optimized view photogrammetry attempts to simultaneously improve the
5.3 Optimized Views Photogrammetry 379

display effect and more substantial observability of the model so that the restoration
of the real scene 3D model to the physical space is more realistic and accurate.
In photogrammetry, the evaluation of the measurement accuracy of data results is
mainly based on checkpoints. Taking the spatial absolute coordinate system as the
reference, the checkpoint verification and comparison represent the exterior coinci-
dent precision of the data results. The disadvantage of using checkpoints for accu-
racy analysis is that the verification results caused by “representing surfaces with
points” are not complete and comprehensive. If there is no position corresponding to
the checkpoint in the data results, the accuracy of the point is unknown. Therefore,
compared with sparsely laid checkpoints, the reference ground truth value with more
complete coverage and higher sampling density is a more ideal choice for accuracy
verification. In addition, if the model output by 3D reconstruction is used as the
evaluation object, then the specific indices include completeness and accuracy, and
in correspondence, a strict and reliable comparison reference is needed.
1. Layout of image control points for fine real scene 3D modeling
Aerotriangulation is the core content of aerial photogrammetry technology, and its
meaning is the triangulation process of determining the 3D coordinates of space
points through aerial images. Early aerial triangulation was used to densify ground
control points, thereby reducing the workload of field control surveys. In addition to
obtaining the space coordinates of pass points, the main uses of image control points
include two aspects:
(1) The conversion between the independent coordinate system, where the measure-
ment values are located, and other coordinate systems are realized.
(2) Correct the error of the measured values based on their own high reliability and
confidence.
Considering the position of the image control point in the surveying area, the method
of measuring and setting, and the difference in its role in the aerial triangulation, the
type of image control point is further divided into horizontal and vertical control
points. Based on the horizontal control point and the vertical control point, when
constructing the surveying control network, the conventional process will also be
divided into the horizontal control network and the vertical control network. Simi-
larly, the evaluation of the accuracy of the measurement results is also divided into
horizontal accuracy and vertical accuracy.
The basic principle of laying control points is to have an equivalent and consis-
tent control effect on the entire surveying area, that is, the error assignment can be
averaged over the entire survey area. Although the real scene 3D model has the char-
acteristics of large scale and high resolution, they also create favorable conditions for
higher precision measurement, but the image control points define the best precision
that can be achieved by photogrammetry results.
The advantage of optimized views photogrammetry technology lies in the ability
of refined analysis and automatic planning before the measurement based on the
rough model, and the aerial photography planning is completed simultaneously with
380 5 UAV 3D Measurement

the layout plan of the image control points. While providing aerial photography plan-
ning capabilities, optimized views photogrammetry also provides analytical support
for the layout of image control points.
According to the technical principle of optimized views photogrammetry, which
is the same as the optimal selection of view, by defining the layout rules, the points
suitable for the layout of the image control are automatically selected from the set
of candidate sampling points. Based on comprehensive analysis, the recommended
selection rules for image control points include the following aspects:
(1) The Poisson sampling disk radius Drdisk is redetermined according to the number
of preset image control points and the range of the survey area, and resampling is
performed on the rough model. The disk radius Drisk during the initial sampling
is given based on the projection width of the camera and the preset overlap
ratio. Then, the second sampling radius is defined as Rrdisk = mn Rdisk , where
m is the number of routes and n is the number of camera stations in each
section of the route. The meaning of this formula is to set an image control
point for n camera stations (views) corresponding to m sections of the route,
and the distribution density of the image control point can also be controlled
by adjusting the parameters m and n. Suppose the sampling point of the second
sampling is srj ; then, the k sampling points closest to the second sampling point
srj in all the sampling point sets S are the candidate image control points, and
all the candidate image control points constitute a point set Scpc .
(2) According to the definition of observability m(s, U) in Eq. (5.9), the generated
sampling points include their observability analysis and are divided into 5 levels
represented by different colors, as shown in Fig. 5.43. To ensure that the image
control points have better control over the measurement, the points with high
observability are preferentially selected from the candidate point set. According
to the preset m pre , select the sampling point where m(s, U ) > m pre , that is, the
analysis results are displayed as green, light blue and blue points.
(3) At this stage, the more stable and reliable equipment with relatively high accu-
racy in the field control measurement is the RTK GPS receiver. During operation,
it is required that the point location should not be near the façade of the object
or the large inclined slope. Therefore, the angle γ between the normal vector
n of the sampling point s and the vertically upward direction can determine
whether the sampling point s is located on a horizontal plane or a plane with
a smaller slope, and the threshold of the angle is set as γ < 30◦ . Points with
corresponding angle γ < 30◦ can be measured with instruments such as a total
station, and the corresponding sampling points can be used as checkpoints.
(4) Considering the stability of the control, it is the basic requirement for the field
control survey to arrange the image control points on the ground. Due to the
abstract nature of the rough model and various intricate low objects on the
ground, the image control points are set at different heights based on the deter-
mined ground elevation. It also corresponds to the view layering in Fig. 5.45.
Except for the ground, it should be ensured that image control points are
distributed at each level. Therefore, when the image control points are selected
5.3 Optimized Views Photogrammetry 381

according to the preset height layer, the label information of the height layer to
which the point belongs is attached to the sampling point.
According to the rules above, the selection strategy adopted is starting from the
ground, from low to high, and selecting from height layer by layer. In the application
scenario of aerial photogrammetry, the selected points need to satisfy γ < 30◦ in
rule 3; then, they are sorted according to the observability m(s, U), and the top three
points are selected from the candidate set determined by each resampling point.
All points selected by this strategy constitute a set of recommended image control
points, which can be measured in the field according to the coordinates of the
recommended image control points during measurement implementation.
Figure 5.44 shows the recommendation of image control points based on the
principles above, in which the brown points represent the second sampling points, and
the pink points are the recommended image control points within the corresponding
range of the second sampling points. Considering that the rough model does not
completely fit the real scene, the point position can be shifted by a certain distance
according to the actual situation. The pink halo circle area in Fig. 5.44 is the image
control offset adjustment range, and the offset distance is less than the baseline width
between the camera stations; that is, it can be considered that the image control point
to be set has not moved out of the overlapping area. Figure 5.45 shows that the
recommended image control points are generated on the surface of the rough model
by the above algorithm. The distribution of points is uniform and conforms to the
3D geometric structure of space, and the points also meet the requirements of plane
station setting design.
In UAV photogrammetry, the field layout of image control points usually adopts
the method of spraying marks and placing signs, or it can also use the obvious signs
already on site. For image data with a ground resolution greater than 3 cm (GSD
> 3 cm), the conventional method can still meet the pixel-level accuracy require-
ments for “pricking points”. In the optimized views photogrammetry mode, it can
be equipped with a higher-resolution camera and is closer to the scene object, so it
can obtain image data with millimeter-level resolution. Under such conditions, the

Fig. 5.44 Image control point generation based on quadratic Poisson disk sampling
382 5 UAV 3D Measurement

Fig. 5.45 Recommended image control point generation results by the coarse model

characterization quality of the existing layout method no longer meets the accuracy
requirements for extracting image control points. Therefore, image control signs
with smaller sizes and higher precision are used in optimized view photogrammetry
technology, and reflective signs often used in close-range photogrammetry can also
be used in some application scenarios. Figure 5.46 shows the image control point
mark used in the practice of optimized views photography, which is pasted to the
corresponding point for high-precision measurement. Compared with spraying or
laying, this scheme is more suitable for image data with millimeter-level resolution.

2. Optimized views photogrammetry performance upgrade based on high-


resolution medium-format cameras

The load capacity of multi-rotor drones is limited, and the quality and specifications
of their imaging sensors are also low. For example, the load performance of the DJI
Phantom4 drone can be summarized as a focal length of 24 mm, and the image
resolution is 5472 × 3648, 20 mega effective pixels. When the imaging system
with this performance is used for high-resolution data acquisition, the corresponding

Fig. 5.46 Optimized views photogrammetry image control signs and layout examples
5.3 Optimized Views Photogrammetry 383

photographic format is very small, and the number of camera stations is very large.
The resulting impact is that the acquisition cost is high, the efficiency is low and, for
post-processing, there will also be problems such as few local matching features and
easy accumulation of errors. In addition, engineering surveying and industrial close-
range photogrammetry have very high requirements for image resolution and quality
of final data results. In similar application fields, low-quality imaging payloads will
be significantly insufficient.
Based on the aforementioned reasons, under the condition of sufficient UAV
carrying capacity, UAV photogrammetry can be combined with high-resolution,
large-format aerial cameras to achieve more efficient, precise, and accurate data
collection. Compared with other forms of UAV photogrammetry, optimized views
photogrammetry is characterized by being closer to the observed object, and the
object distance and sampling density are determined according to the resolution
and overlap ratio requirements. Therefore, a high-quality camera not only helps to
improve data quality but also provides more choices for the location of drone camera
stations. For some scenes or objects that are difficult to achieve close-up photography,
it can also complete effective data acquisition.
Compared with low-resolution cameras, two high-resolution UAV aerial cameras
adapted to optimized view photography technology were used for comparative
testing: the DJI P1 full frame camera and the Rainpoo M10 medium format camera.
The parameters of the two indices are shown in Table 5.1. In comparison, it is found
that the sensor of medium-format cameras is larger and higher resolution, and with
a longer focal length lens, the same or even higher definition image data can be
obtained by a longer focal length lens.
For optimized views photogrammetry, the advantages of high-resolution cameras
are not only the improvement of resolution but also the reduction of the number of
camera stations for aerial photography through larger-format photographic coverage,
thereby improving the efficiency of aerial photography. For the same scene, opti-
mized view planning was carried out with the corresponding parameters of three
different imaging payloads, including the two cameras mentioned above and the
camera attached to Phantom 4. The planned route is shown in Fig. 5.47. Figure 47a–c
are the UAV routes generated by Phantom 4 (P4R), P1, and M10, respectively.
Table 5.2 corresponding to the UAV route shown in Fig. 5.47 contains three main
index parameters for the three imaging payloads, namely, the number of camera
stations, object distance and average sampling distance (ASD). By comparison, it
can be found that when a higher resolution and larger format imaging payload is used,

Table 5.1 High-resolution aerial camera parameters


Camera Focal length/mm Sensor format/mm × Image resolution/ Effective pixels/ ×
mm pixels × pixels 106 pixels
P1 35 35.9 × 24 8192 × 5460 45
M0 60 43.8 × 32.9 11,648 × 8736 102
384 5 UAV 3D Measurement

Fig. 5.47 Comparison of the optimized view planning with different imaging payloads

aerial photography data collection with higher spatial resolution can be achieved at
a longer object distance and fewer camera stations.
Under the scene conditions above, the 3D reconstruction results of the data
collected by the M10 and P1 high-resolution cameras are compared. As shown in
Fig. 5.48, subfigures of the first row are the processing results of the M10 camera,
and those of the second row are the processing results of the P1 camera. The spatial
resolutions of the two are similar, and the image resolution of M10 is slightly better
than that of P1. Subfigures of the first column show the panorama of the 3D model
of the building, the second column shows parts of the model, and the third column
demonstrates the local details. After comparison, it can be found that the reconstruc-
tion effect of a larger format image is better, and the geometric structure is restored
accurately and has fewer distortion errors.
In addition, Fig. 5.49 shows the visualization results of aerial triangulation using
the images acquired by the two cameras above. The left side is the result of M10,
and the right side is the result of P1, where the orange patches represent the aerial
views. In addition, it also includes object points such as tie points and ground control
points. By comparison, it can be intuitively judged that the number of viewpoints of
M10 is significantly less than that of P1, and the distance from the observation scene
is significantly farther.
Table 5.3 shows the quantization indicator description of the triangulation in
Fig. 5.49, including the ASD, the number of images, the processing time, and the
RMSE of the tie points. The comparison of the corresponding aerial triangulation
indicators of the two payloads shows that the camera with a larger format and higher
imaging resolution can obtain higher spatial resolution images with fewer numbers;
at the same time, the aerial triangulation of the image can also achieve a smaller error
of the tie point. However, if pixels instead of images are used as the computing unit, it

Table 5.2 Comparison of imaging payload and the optimized views route planning index
parameters
Camera Number of exposure station Object distance/m Averaged sample distance/(mm/
pixels)
P4R 7203 70 19.2
P1 4939 50 6.18
M10 2765 100 5.37
5.3 Optimized Views Photogrammetry 385

Fig. 5.48 Comparison of high-resolution camera the optimized views photography and reconstruc-
tion model results

Fig. 5.49 Comparison of the aerial triangulation results of the optimized views photography with
high-resolution camera

is not difficult to find that the actual amount of data collected does not decrease, and
the cost of post-processing does not decrease but increases. The aerial triangulation
processing time in Table 5.3 confirms this. Under the same hardware configuration
environment, the aerial triangulation processing time of the data captured by M10 is
almost 50% longer than that of the data captured by P1. Therefore, from the overall
consideration of the operation process, efforts will be made to achieve a balance
between data acquisition and data processing.

Table 5.3 Comparison of triangulation indicators of imaging payload for the optimized views flight
routes
Camera ASD/(mm/pixels) Number of images Processing time RMS of tie points/
pixels
P1 6.18 9258 3 h 19 m 50 s 0.52
M10 5.37 4092 4 h 52 m 08 s 0.49
386 5 UAV 3D Measurement

3. Verification and analysis of optimized views photogrammetry accuracy


The area where the Huiwen Building of Shenzhen University is located is selected
for accuracy verification. The main structure of the Huiwen Building in the test area
is complex and distributed in a corridor-like structure. In some areas, the buildings
have short spacing, which results in occlusion between different corridor structures.
The average height of the Huiwen Building is approximately 35 m. There is a high-
rise building of about 55 m on the northeast side of the Huiwen Building, and other
areas are covered by dense vegetation. To verify the 3D reconstruction accuracy of
optimized views photogrammetry, 28 control point targets were arranged on the top
and facade of the Huiwen Building, as shown in Fig. 5.50b. The rooftop control point
distribution is shown in Fig. 5.50a.
Using the prior information of the experimental area to generate a rough model
and fully considering its geometric structure, the UAV camera viewpoint and optimal
flight path surrounding the entire experimental area are output. To compare to
traditional oblique photogrammetry, oblique photography data were acquired in the
conventional 5-lens oblique photography operation mode, and the flight height was
fixed at 100 m. A P1 camera with a focal length of 35 mm and an image resolution
of 8192 × 5460 was used as the photographic configuration for data acquisition,
and the final captured images of the optimized views and oblique photography were
4030 and 3620, respectively.
Figure 5.51a, b show the UAV images of high-rise buildings in the test area
collected by the optimized views and oblique photography, respectively. The tradi-
tional oblique photography operation mode adopts a fixed height, which easily leads
to insufficient or missing observations of the lower part of the building. In contrast,
optimized view photogrammetry uses prior information to constrain the generation

Fig. 5.50 Some control point and check point distributions in the experimental area
5.3 Optimized Views Photogrammetry 387

Fig. 5.51 Comparison of the optimized views and oblique drone images

of UAV viewpoints and can collect images of the bottom of buildings to achieve full
observation of the target.
To verify the overall accuracy of the reconstructed mesh model, the ground point
cloud data of the entire test area were collected as the reference truth value, as shown
in Fig. 5.52. The maximum measuring range of the equipment for data acquisition is
80 m, and the point cloud accuracy is better than 2.4 mm within the range of 20 m.
To analyze the 3D reconstruction accuracy of optimized views photogrammetry,
three uniformly distributed ground control points numbered K02, K04, and K10 in
Fig. 5.50a were used for the aerial triangulation adjustment calculation to realize
the absolute orientation of the 3D model. The remaining points are used as check-
points for accuracy verification of the reconstructed model. The key indicators of AT
processing are summarized in Table 5.4. Among them, efficiency represents the time
consumption of the image matching and the adjustment calculation; the number of
tie points includes the median number of tie points in a single image and the total
number of tie points in all images; completeness represents the number of images

Fig. 5.52 Ground LiDAR


point cloud data
388 5 UAV 3D Measurement

successfully oriented; and the accuracy is the reprojection error of tie points. It can be
seen that ➀ the number of oblique images is relatively small; ➁ the total number of
tie points of the optimized views photography is smaller than that of oblique photog-
raphy; and ➂ optimized views photogrammetry achieves all image orientations with
an accuracy of 0.62 pixels, which is better than oblique photography.
Figure 5.53 shows the number of rays of tie points for the aerial triangulation of
optimized views and oblique photography. The number of rays in aerial triangulation
is the number of images the tie point is associated with. As shown in Fig. 5.53,
oblique photogrammetry uniformly captures images in the test area because the
drone captures data along evenly distributed flight lines and the camera is tilted
in the same direction. In contrast, optimized views photogrammetry uses the prior
information of the scene to adjust the flight line, and the direction of the camera to
obtain as many building images as possible. The blue area in the middle of Fig. 5.53b.
It is precisely because of the abovementioned image acquisition method that the tie
point in optimized views photogrammetry has many rays on the ground and building
facade, as shown in the red ellipse area in Fig. 5.53a, b.

Table 5.4 Statistical results of key parameters in aerial triangulation


Approaches Efficiency/min Number of tie points Completeness Accuracy/pixels
Median Total
OP 50.8 948 654,468 3615/3620 0.69
OVP 60.1 986 558,751 4030/4030 0.62

Fig. 5.53 Comparison of the number of tie point rays


5.3 Optimized Views Photogrammetry 389

After the absolute orientation of the model by using the ground control points,
Table 5.5 summarizes the horizontal and elevation residuals of the 25 checkpoints
in the aerial triangulation adjustment, and Fig. 5.54 shows the single-point residual
distribution. Optimized views photography and oblique photography have compa-
rable absolute positioning accuracy. Therefore, although different from oblique
photogrammetry, which prioritizes the stability of the image connection network,
optimized views photogrammetry is more concerned with the precise sampling of
the subject. However, the working mode of optimized views photogrammetry can
still establish a stable image connection network to ensure the absolute positioning
accuracy of the aerial triangulation.
To verify the quality of the 3D reconstructed mesh model, Fig. 5.55 shows the
reconstructed model of the main building of Huiwen in the test area and compares
the four local areas of the top, elevation and bottom of the model. It can be seen from
Fig. 5.55a, b that both the optimized views photography and oblique photography
models can reconstruct the complete structure of the main building, and the overall
reconstruction quality is comparable. However, the local detail comparison can be
seen in Fig. 5.55c:

(1) For the area at the bottom of the building, oblique photography is difficult to
capture, and the reconstruction quality is poor, as shown in the comparison
diagram of No. 1 in Fig. 5.55c.
(2) For ancillary facilities on the top of the building, such as air conditioners and
power distribution boxes, although oblique photography can take images, due
to the small size of these facilities and occlusion, the reconstruction model of
oblique photography is incomplete. As shown in the comparison diagram of
No. 2 in Fig. 5.55c.

Table 5.5 Residual statistics for check points after aerial triangulation
Approaches Min value/m Max value/m RMSE
XY Z XY Z XY Z
OP 0.005 − 0.039 0.057 0.045 0.024 0.022
OVP 0.004 − 0.037 0.063 0.035 0.026 0.022

Fig. 5.54 The residual distribution of the check points in aerial triangulation
390 5 UAV 3D Measurement

Fig. 5.55 Comparison of the quality of mesh models

(3) Building facades are a key component that traditional oblique photography
needs to address. However, due to the concave characteristics of corridors and
balconies, there is a large deviation in the reconstruction model of oblique
photography, which is mainly manifested in the inclination of the wall. As
shown in Fig. 5.55c No. 3 and 4 in the comparison diagram. For the above
areas, optimized view photogrammetry can accurately collect enough images,
and the reconstructed model is also better.
Further analysis of the model quality of the building facade between optimized
views photography and oblique photography. Figure 5.56 shows the selected three
building façade distributions and their point cloud data. Among them, facade 1
contains many glass windows, and the point cloud is mainly distributed on the outer
wall of the building; facade 2 contains a concave balcony corridor, and the point
cloud is evenly distributed in this area; Facade 3 contains many glass windows and
concave balconies, and the structure is more complex. By comparing the point cloud
of the facades with the reconstructed model, the statistical error of the model is shown
in Table 5.6.
Figure 5.57 shows the error distribution of the building facade model and the
statistical results of the error histogram in the upper left corner. From the test results,
it can be seen that, except for the maximum value of area 3, the reconstruction model
of optimized views photogrammetry is better than oblique photogrammetry in all
three indices. Compared with oblique photogrammetry, optimized views photogram-
metry can significantly improve the model reconstruction accuracy, which is about
3–5 times higher than oblique photogrammetry, and reduce the insufficient image
acquisition in the occluded area of building facades, the error distribution shown in
Fig. 5.57b, d, f.
5.3 Optimized Views Photogrammetry 391

Fig. 5.56 The distribution of building facades and corresponding point clouds

Table 5.6 Statistical results of errors of building facades


Facade Max value/m Mean value/m Standard deviation
OP OVP OP OVP OP OVP
1 1.12 1.10 0.09 0.03 0.19 0.12
2 2.70 2.42 0.17 0.11 0.33 0.25
3 2.26 2.28 0.16 0.03 0.27 0.12

5.3.3 Multi-UAV Collaboration in Optimized View


Photogrammetry

The comparison with oblique photogrammetry fully verifies the 3D reconstruction


accuracy of optimized view photogrammetry. However, as a means of aerial photog-
raphy data acquisition, the aerial planar flight mode of oblique photogrammetry and
the load configuration of the five-lens camera are used as comparison references.
The spatial polyline flight mode and single-lens camera configuration of optimized
views photogrammetry determine that its operating efficiency must be relatively low.
The limited battery capacity of the multi-rotor UAV further restricts the flight time
during the acquisition process. In a comprehensive comparison, the specialties of
optimized views photogrammetry lie in more flexible, closer, and thus more refined
data collection.
Since the operation efficiency determines the cost expenditure, task cycle, and
other contents, it is always an unavoidable important consideration in the practice of
surveying and mapping engineering. Although high efficiency is not the best aspect
of optimized view photogrammetry, there are still many ways to improve the work
392 5 UAV 3D Measurement

Fig. 5.57 The distribution of errors of the building facades

efficiency of optimized view photogrammetry. The purpose of the aforementioned


medium- and large-format high-resolution cameras is to improve the acquisition
efficiency through hardware integration. Another way to improve efficiency is to
achieve synchronous acquisition through multi-drone collaboration.
1. Collaborative data collection of optimized views photography based on flight
line division
Different from the technical approach used in actual business, such as drone light
shows, the multi-drone collaboration driven by optimized views photogrammetry
technology does not depend on the communication between unmanned aerial vehi-
cles. It is mainly based on the segmented planning of the UAV flight route. During
5.3 Optimized Views Photogrammetry 393

the operating process, multiple UAVs fly different segments of the route to realize
the synchronous data acquisition of multiple UAVs within the unit operation time.
The basic principle of the optimized views photogrammetry flight route division
is given above. It is mainly based on the approximation of the perspective pose or its
height. The viewpoints are divided into subsets of different areas or different altitudes
through clustering processing. The viewpoints in the subsets constitute the waypoints
of route segments, and after connection, they become the routes corresponding to the
subsets of viewpoints. In the processing operation, it is only necessary to specify the
number of viewpoints contained in each subset, that is, each route, and the endurance
of each sortie of the multi-rotor UAV can basically achieve an accurate correspon-
dence between the number of viewpoints at the same time. Therefore, when the
endurance time parameters of the UAV are given, combined with the overall number
of optimized viewpoints, the specific division form and number of segmented routes
can be determined in a partitioned or hierarchical manner. Based on this information,
the number of sorties that need to be flown can be determined, and then the number
of drones or power supplies that need to be configured can be planned.
The division of the optimized views routes provides the basic premise for multi-
UAV collaboration. In addition to the division function itself, another main factor
for the collaboration to be carried out is that there is no overlap between the divided
segment routes, and the distance between adjacent segments meets the requirements
of safe distance. Since the viewpoint subset is the result of clustering and segmen-
tation based on the original optimized viewpoint, there is no deletion or adjustment,
so the route division not only retains the overlap ratio and resolution settings but
also maintains the complete coverage of the scene objects. There will be no weak-
ening or impact on the observation and acquisition capabilities of optimized view
photogrammetry.
The practical application of a large-scale complex building is taken as an example
to illustrate the multi-UAV collaborative mode of optimized view photogrammetry.
The complex covers an area of approximately 0.5 km2 , including the hotel area on
the south side, the exhibition hall area in the middle, and the parking lot on the
north side. The building height of the whole area is 50–60 m. The hotel building
is a special-shaped structure, and the passage between the buildings is relatively
narrow. The pre-test planning is divided according to the functional type of the area,
as shown in Fig. 5.58a. The hotel area is marked as S-1, and the exhibition hall is
divided into four areas: M-1, M-2, M-3, and M-4. The two multi-story parking lots
are marked as N-1 and N-2, and the buildings in each subarea are taken as the specific
objects of planning, which makes the viewpoint optimization and route generation
more targeted. Figure 5.58b is a top-down overview of the optimized view routes
for zoning planning. During the testing process, based on the rapid construction of
a course model, two DJI Phantom4 UAVs were used for collaborative acquisition
for each division according to the optimized views route in Fig. 5.58b. The average
spatial resolution of the image data is 1.84 cm, each route contains 180 viewpoints,
and a total of 13,080 images, including additional shots on the ground, were collected.
The synchronous collection took 6 h in total.
394 5 UAV 3D Measurement

Fig. 5.58 The collaborative survey planning of the optimized views photogrammetry for the
building complex

It is worth noting that the measurement acquisition was supplemented with addi-
tional shots on the ground, which is similar to the collaborative acquisition of the
air and the ground in terms of implementation and operation. Optimized views
photogrammetry can plan the view of the camera station at low altitudes or even
on the ground and can drive the automatic acquisition platform equipment in the
air and on the ground to perform collaborative measurement. Figure 5.59a shows
the aerial triangulation visualization result of 3D reconstruction using this set of
data, in which the orange patches represent the viewpoints. It can be seen that there
are additional ground-shooting viewpoints around the building at the bottom. In the
air triangulation combined adjustment stage, the fusion of air and ground data is
implemented, and this data fusion also corresponds to the collaborative design and
collaborative operation in the planning and implementation stages. The panorama of
the 3D reconstruction result of this dataset is shown in Fig. 5.59b.
This example shows that for such a large building group with a relatively low
overall floor height, through multi-UAV and even air-ground coordination, it can

Fig. 5.59 Real scene 3D processing results of the building complex by collaborative data collection
5.3 Optimized Views Photogrammetry 395

achieve more efficient real scene 3D data collection in a relatively short period of
time and can ensure that the final result is more precise and the geometric structure
is more accurate and complete.
2. The ubiquitous joint application mode of optimized views photogrammetry
Although optimized views photogrammetry does not realize collaborative operation
through the information interconnection between the acquisition terminals, it can
still achieve effective collaborated application mainly through the method of task
allocation on routes based on good pre-measurement planning. Therefore, using the
unique ability of fine planning and route division, optimized views photogrammetry
technology transforms UAV aerial photography planning into business production,
and the output product is the optimized views flight routes. As the input of aerial
photography implementation or real scene 3D data collection, the optimized views
route can be seamlessly connected with the automatic control of the UAV system,
and the segmented form of the route naturally has the attribute of distribution config-
uration. Therefore, the collaborative function of optimized views photogrammetry
can be further expanded into a new technical model that can support ubiquitous UAV
aerial photography applications.
In actual operation, optimized views photogrammetry connects pre-measurement
planning and measurement implementation in a cloud service mode. As shown in
Fig. 5.60, a PC is used as the planning terminal to plan and design routes and image
control points and upload the generated routes and recommended image control
points to the cloud server. During the surveying, the mobile control device loaded
with the relevant application program downloads the flight route and the coordinates
of the control point from the cloud and controls the UAV system and the measurement
equipment to collect the aerial image and measure the control point.
This mode realizes the unbinding of optimized view aerial photography plan-
ning and aerial survey implementation so that the two tasks no longer need to be
completed by the same group of personnel. Using the fast-processing power of the
computer, aerial photography planning and route production are placed in the back-
ground, which is conducive to the operation of large amounts of data and specialized

Fig. 5.60 Application mode of optimized views photogrammetry technology based on cloud
service
396 5 UAV 3D Measurement

and centralized processing. UAV operators and field technicians only need to receive
assigned tasks represented by UAV segmented routes and image control point data
and based on the high automation capabilities of UAV systems and surveying equip-
ment. The specialization requirements for the operator and test personnel will be
further reduced. Even when long-term continuous measurement collection is needed,
it can be combined with UAV automated airports and other autonomous robot launch
and retrieval terminals to achieve fully unmanned control.
Therefore, it can be considered that optimized view photogrammetry relies on
its own strong automation performance and unique cloud service mode to provide
ubiquitous technical support for the implementation of UAV photogrammetry-related
services. Additionally, in the context of the wide demand for real scene 3D data
information, it has the potential to be derived into the industrialized application
model of crowdsourced UAV aerial surveys.
Based on various factors, it is concluded that the ubiquitous support of optimized
views photogrammetry mainly exists in the following aspects:
1) Ubiquitous application
Optimized view of photogrammetry enables relatively low-cost system equipment
such as MAVs through highly automated technical processing capabilities. There-
fore, compared with large aircraft platforms, expensive loads, and the same high
investment cost of surveying, optimized views photogrammetry is more flexible,
simpler, and easier to implement for a wide range of applications.
2) Ubiquitous integration
Optimized views photogrammetry can directly act on the UAV’s flight controller
from the base of the technology. At the same time, its reliance on rough models and
the core functions of object-oriented observation and real-scene 3D modeling can
be organically combined with close-range photogrammetry and reality 3D modeling
software. Optimized views photogrammetry technology connects the front and back
ends and integrates acquisition equipment, planning control technology, and data
post-processing software into a complete fully automated processing technology
chain.
3) Ubiquitous extension
Optimized views photogrammetry technology drives the UAV to obtain a closer and
lower flight altitude, which lays a good foundation for the effective matching and
fusion between the image data obtained in the air and the data collected on the
ground. In addition, based on the high-definition real-scene modeling capabilities,
optimized views photogrammetry technology can better support the automatic single
3D building model. Based on the refined and complete coverage of scene objects,
the optimized view route is also suitable for monitoring, detection, and other related
applications. Optimized views photogrammetry has prepared the foundation for the
docking of the technical core, and it is easier to expand to related technologies.
5.3 Optimized Views Photogrammetry 397

4) Ubiquitous business
The ubiquitous characteristics of the aforementioned three aspects inevitably deter-
mine that its business scenarios are also very extensive. Optimized views photogram-
metry is not limited by spatial scale, structural complexity or distribution shape and
other conditions; it can process scenes such as urban areas, parks, factories, buildings,
etc., to individual objects such as infrastructure projects, industrial structures, and
single buildings. In addition, the business direction of optimized views photogram-
metry includes real scene 3D reconstruction, digitalization of cultural relics protec-
tion, industrial close-range photogrammetry, engineering monitoring and inspection,
etc. As the technology gradually matures, it will expand to more application fields.

5.3.4 Optimized Views Photogrammetry Applications

The analysis above indicates that optimized views photogrammetry has the char-
acteristics of being suitable for ubiquitous business scenarios. This section selects
three representative typical project cases: 3D reconstruction of urban real scenes,
digitization of cultural relics and historical sites, and close-range photogrammetry
of large-scale equipment. Facing all kinds of scene objects with specific require-
ments and changing environmental conditions, the good performance of optimized
view photogrammetry technology reflected in engineering practice is illustrated by
examples.
1. 3D reconstruction of real scenes in the urban central district
Qingdao is a very important central city and an international port city on the east
coast of China, with a built-up area of 758.16 km2 . The city launched the “Real
Scene 3D Qingdao” construction project in 2020. It provides the support foundation
of spatiotemporal information for digital city construction.
The area of 2 km2 around May Fourth Square in Qingdao is determined as the core
area of the real scene 3D Qingdao construction. This area is composed of various
functional areas, such as government, business, residence, and recreation. As the
core functional area of the city, this area is not only a key area but also a difficult
area for real-scene 3D projects. Its importance is reflected in the requirements for
the quality of the results, especially the high resolution and high precision of the real
scene model.
As a representative urban scene, there are various problems in the process of
3D data collection of real urban scenes in this area. Among them, three difficulties
are more prominent: the height fluctuation of ground objects is large. The buildings
are densely distributed, and the spacing is narrow. There are many special-shaped
structures, rich details of functional components, and special non-Lambertian surface
reflection problems, such as glass curtain walls. Based on its own good technical
capabilities and reasonable design and configuration in the implementation process,
398 5 UAV 3D Measurement

optimized views photogrammetry has effectively overcome the above difficulties and
achieved good results in the real scene 3D data collection of this urban core.
Based on the operational capabilities of MAVs and the functional characteristics
of optimized views photogrammetry multi-drone collaboration, the surveying area is
divided into the 2 km2 Qingdao city core, as shown in Fig. 5.61. Mainly based on the
type of function and the average height of the building, the whole area is divided into
6 zones, numbered 1–6. It is further subdivided into smaller blocks according to the
3D expansion area; the largest block 5–6 covers an area of 0.37 km2 , and the smallest
block 2–8 contains the highest super high-rise building in the region, covering an
area of 0.02 km2 . Buildings in each block are of similar height, and each block is
used as a specific scene object for optimized view planning.
Using the zoning planning processing mode for different heights of ground
objects, the acquisition-related settings of the UAV are more targeted. Figure 5.62a
is the model of block No. 3, which belongs to the area with a large height difference.
The given model details include the glass curtain wall of the high-rise building,
the special-shaped structure and antenna at the top of the building, the facade of
the middle-rise building, the detached house, and the central air-conditioning equip-
ment. Figure 5.62b shows the reconstruction of the real scene model of the optimized
view collection in the No. 4 block. The block mainly includes flat ground and low
buildings. Such districts are often areas with poor data collection quality for oblique
photography. Figure 5.62b shows several details of the reconstructed model in this

Fig. 5.61 Division of the optimized views of the aerial surveying areas in the urban core
5.3 Optimized Views Photogrammetry 399

area. The sculptures, houses, and road light poles on the ground have been well recon-
structed and restored. The example proves that optimized views photogrammetry can
better solve the difficult problems in the 3D acquisition of real urban scenes.
Even in the face of complex urban spatial structure scenes, optimized views
photogrammetry can still maximize the maneuverability of UAVs by the data collec-
tion methods of approaching and surrounding objects. Figure 5.63 shows the results
of aerial triangulation and the panorama of the real scene model for Sub-block 2, the
Central Business District. The orange patch in Fig. 5.63a represents the views, which
are emerging a state of closely surrounding the buildings in the surveying area. As
seen from the panorama of the real scene model shown in Fig. 5.63b, the buildings

Fig. 5.62 The real scene reconstruction model and local details of the optimized views aerial
photography
400 5 UAV 3D Measurement

Fig. 5.62 (continued)

in the area are mostly high-rise buildings. In Fig. 5.63c, the spacing between some
buildings is very narrow, and the optimized views photogrammetry still accomplished
the data collection well.
For collaborative collection, six P4R UAVs were used, the net acquisition time
was 9 h, the number of collected images was 98,373, and the average resolution of

Fig. 5.63 The results of aerial triangulation and the real scene model at the central business district
5.3 Optimized Views Photogrammetry 401

Fig. 5.64 The panorama of the real scene 3D model of the urban core by optimized views
photogrammetry

the model was 2.55 cm. The final output panorama of the real scene model for the
whole area is shown in Fig. 5.64.
To evaluate the quality of the data results, 265 checkpoints were measured in
the whole area to calculate the horizontal error and elevation error of the model.
The horizontal mean square error is 0.062 m, and the elevation mean square error is
0.065 m. The error distribution is given in Fig. 5.65.
2. Digitization of cultural relics and ancient buildings
The protection of cultural relics is also one of the very important application scenarios
of real scene 3D. At present, the main technical content is the fine digitization of
cultural relics or ancient buildings. The selected scenes are Guangji Bridge and
Guangji Gate Tower in Chaozhou, Guangdong, China. As shown in Fig. 5.66. Guangji
Gate Tower is the east gate tower of Chaozhou Ancient City, a three-story palatial
attic. Guangji Bridge is located outside the gate of Guangji Gate Tower. It is one of

Fig. 5.65 The check point error distribution of the real scene 3D model in the urban core
402 5 UAV 3D Measurement

Fig. 5.66 Guangji Gate Tower and Guangji Bridge

the four famous ancient bridges in China. Founded in the Southern Song Dynasty
in 1171 AD, it is the earliest opening and closing style bridge in the world. Guangji
Bridge is composed of stone beam bridges on the west and east sides and a floating
bridge in the middle. There are bridge pavilions on each pier of the Guangji Bridge,
forming a unique pavilion-style architectural style.
Image-based digital 3D reconstruction of the Guangji Bridge is very challenging.
First, the bridge buildings are in a banded spatial distribution. The floating bridge in
the middle section of the Guangji Bridge also has relatively high dynamics, which is
prone to error accumulation in the linear direction of the bridge. Second, the Guangji
Bridge is very unfavorable for the implementation of aerial photography collection
due to the ancient Chinese architectural style of the eaves and the corners, the small
distance between the bridge pavilions, and the unique consecutive- pavilion-style
structure. The Guangji Bridge runs east–west. The direct sunlight on the bridge body
and the strong reflection on the water surface will also degrade the quality of aerial
photography. In addition, it is also necessary to fuse the image data inside the bridge
pavilion by re-capturing it on the ground.
To ensure the consistency of the overall coordinate system of the subsection area
and the accuracy and reliability of the route planning, 6 image control points are
arranged on both sides of the Guangji Bridge and on the top of the pier connecting
the pontoon. The point distribution is shown in Fig. 5.67. The image control points
are marked in red.
In addition to the fineness of the geometric structure, the digital scene of cultural
relics and monuments also requires a very high spatial resolution of the model.
Considering the complexity of the structure of the pavilion, a P4R UAV with a small
wheelbase is used to increase flight safety while flying close to the sight distance of
3–5 m to obtain a spatial resolution better than 1 cm.
A pre-flight approach is used to process and generate a rough model. On the
basis of generating a rough model, according to the spatial layout and structural
characteristics of the Guangji Gate Tower and Guangji Bridge, the data collection
and implementation of the optimized views is divided into the Guangji Gate Tower,
5.3 Optimized Views Photogrammetry 403

Fig. 5.67 Distribution map of surveying image control points of Guangji Gate Tower and Guangji
Bridge

the west girder bridge, the middle floating bridge, and the east girder bridge. Four
sections are used for aerial photography planning. The specific flight route generation
is shown in Fig. 5.68.
In the actual acquisition process, it is found that the positioning deviation caused
by the drift of the IMU affects flight safety, so it is necessary to calibrate the IMU
before each sortie takes off. The collection is also implemented via multi-UAV coop-
erative operation. Along the bridge deck, ground supplementary shots were taken
with hand-held drones to integrate the image textures of the city gate interiors under

Fig. 5.68 Guangji Gate Tower and Guangji Bridge section planning of the optimized views route
404 5 UAV 3D Measurement

Fig. 5.69 Real scene 3D digital model of Guangji Gate Tower and Guangji Bridge and some feature
details

the eaves of the bridge pavilion and in the pavilion. The final collected data volume
is 7092 images, and the average spatial resolution is 7.55 mm/pixel. After the aerial
triangulation is completed, the RMS of the horizontal error of the image control
point is 5.1 μm, and the RMS of the vertical error is 7.6 μm. Figure 5.69 shows the
digital real-scene model of the Guangji Gate Tower and Guangji Bridge generated by
the optimized view acquisition and reconstruction. The restoration of architectural
geometric features and the high-resolution presentation of plaques and couplets show
that optimized view photogrammetry can be applied to similar complex structural
cultural protection applications.
3. Close range photogrammetry of industrial structures
Due to the characteristics of high measurement accuracy, passive reception detec-
tion, and surface acquisition coverage, it has special requirements for close-range
photogrammetry in engineering or industrial fields. One of the features of the devel-
opment of modern engineering or industrial structures is that the spatial scale is
increasing, while the measurement accuracy requirements are increasing.
Large radio antennas are important devices for long-distance communication
transmission, astronomical observation, or deep space exploration, as shown in
Fig. 5.70a. Due to the correlation between the observed wavelength and the antenna
efficiency, the main reflector measurement accuracy of a radio antenna is a key indi-
cator that affects its receiving performance. Therefore, as an important part of the
production process, main reflector measurement is required in the stages of antenna
installation and inspection, and the measurement accuracy should reach 1/3–1/5 of
the parabolic surface accuracy.
Close-range photogrammetry is one of the main methods to achieve high-precision
measurement of the antenna main reflector. However, close-range photogrammetry
5.3 Optimized Views Photogrammetry 405

Fig. 5.70 Large radio antenna and manually operated close-range photography for surface
measurement

usually requires experienced technicians to manually operate the camera to take


close-up shots. The range of close-up photography is limited, and global stereoscopic
coverage must be achieved by moving and changing the position of the camera station.
The diameter of large antennas usually ranges from tens to hundreds of meters. At
the same time, considering the influence of gravity and the difference in actual use
status, it is necessary to measure the main reflector of multiple pitch attitudes, such
as 0° and 40°. As shown in Fig. 5.70b, a close-up photography operation method has
been adopted for a large-scale antenna. The crane is used to hoist the gondola, and
the technicians stand in the gondola to manually collect images. The conventional
manual close-range photogrammetry method will face greater difficulties and risks
under this requirement.
The application of UAV aerial photography systems for main reflector measure-
ment is a beneficial attempt in terms of improving efficiency and reducing risks. For
this specific scenario, further breakdown of the collection requirements includes the
following:
(1) Ensuring complete stereoscopic coverage of the paraboloid of the main reflector,
especially the measurement marker.
(2) It is necessary to avoid or eliminate the obstruction and interference caused by
the feed bracket to aerial flight and aerial photography.
(3) It can adapt to the attitude change of the antenna in the pitch direction and make
corresponding adjustments.
(4) Maximize the degree of automation and reduce manual control operations.
(5) It is close to or even better than the measurement accuracy index that can be
achieved by close-range photogrammetry.
Based on the demand analysis above, optimized view photogrammetry can realize
the main reflector measurement based on the UAV platform system.
406 5 UAV 3D Measurement

The difference from the 3D urban real scene is that for the specific object of the
radio antenna, the design model of the structure is introduced as a rough model. The
main reason for this configuration is that entities are usually produced from design
models, which are digital proxies for the corresponding entities. Moreover, compared
with the model generated by pre-flight acquisition, the integrity and accuracy of
the design model are better, which is more conducive to the realization of high-
precision measurement. Furthermore, to comprehensively observe the impact of the
surrounding environmental factors of the object on flight safety, it is necessary to
integrate the design model with the scene rough model generated by processing and
accurately place the design model in the corresponding position of the scene model
through the control points. Figure 5.71a is the simplified design model of the radio
antenna, and Fig. 5.71b is the effect of the fusion of the design model and the rough
scene model, of which the orange part is the design model corresponding to Fig. 5.71a
in the fused scene.
We integrate the imported design model and the rough model of the scene, take the
antenna main reflector as the observation object, and complete the optimized views
aerial photography planning in the order of sampling generation, view optimization,
and even forming the observation route, as shown in Fig. 5.72a–c, which are obser-
vation sampling, view optimization, and route generation, respectively. Similarly,
an observability analysis can be performed on the antenna reflector, as shown in
Fig. 5.72d. In the application scenario of antenna main reflector measurement, the
observability of the same part of the reflector under different state conditions will
change due to the transformation of the antenna attitude. However, if the conditions of
the UAV system, especially the camera gimbal, are limited, the mode strategy of data
acquisition needs to be changed accordingly. The observability analysis can indicate
whether the same part on the antenna reflector can still be effectively observed under
different conditions, thus providing a basis for the adjustment of the acquisition
method, and can also be used as an important tool for the analysis of the observation
accuracy.

Fig. 5.71 Radio antenna design model and scenario environment fusion
5.3 Optimized Views Photogrammetry 407

Fig. 5.72 The optimized views planning for surface measurement of radio antenna and observ-
ability analysis

Although it is also based on the principle of stereoscopic observation, the tech-


nical routes of close-range photogrammetry and real scene 3D reconstruction are
different. To pursue higher-quality measurement results, close-range photogram-
metry will use measurement markers to accurately extract the center of the marker
through image processing and combine the precise calibration of the camera and the
photogrammetry solution to improve the measurement accuracy. For the implemen-
tation measurement of the surface and body, measurement marks are usually densely
attached to the surface of the observation object, the center point of the mark is used
as the measurement object, and the measurement results of the surface and body
are obtained by further fitting. Therefore, using optimized views photogrammetry
technology for main reflector measurement can obtain two outputs: one is the recon-
struction result with richer apparent texture details, and the other is the measurement
result with higher accuracy of the parabolic surface represented by the landmark
points.
Figure 5.73a, b show the measurement marks attached to the reflector under natural
light during the day and strong reflection at night, respectively, including coded
markings for orientation, dimensioning, and dot markings corresponding to antenna
panel adjustment bolts. Although the appearance of the survey markers is obviously
different under the two conditions, the survey markers can both function as control
points. Thus, the foundation for the fusion of aerial photogrammetry and close-range
photogrammetry by means of UAV aerial photography has been established.
408 5 UAV 3D Measurement

Fig. 5.73 Measurement marks on the reflector of the antenna under natural light and highly
reflective conditions

In the application, daytime and nighttime data are collected in two UAV system
configurations, and the corresponding output 3D reconstruction and close-range
photogrammetry results.
The system configuration for daytime collection is that the DJI M300 is equipped
with an M10 camera, 102 million effective pixels, and a 60 mm focal length lens.
With a viewing distance of 35 m and corresponding to an average spatial resolution
of 1.35 mm, the optimized view aerial photography planning is completely aimed
at the antenna reflector. A total of 103 images were collected, and the accuracy of
the reconstruction results was represented by the reprojection error with a specific
value of 0.34 px. Figure 5.74 shows the 3D reconstructed antenna reflector, in which
Fig. 5.74a is the overall reconstruction result of the antenna reflector, and the shape
of the reflector is regular. Figure 5.74b shows the details of the reflector from four
angles. It can be seen that the reconstructed reflective surfaces have flat geometry
and high-resolution textures.
The configuration of the night acquisition is that the DJI M600 drone is equipped
with a Canon 5Ds camera, 50.6 megapixels, and a 28 mm focal length. Since it
is a nighttime shooting of reflective signs, it is equipped with a ring flash. The
object distance for planned acquisition is set to 24 m, and the corresponding average
spatial resolution is 3.55 mm. A total of 110 images were collected, and the practical
measurement included 88 coded marks and 366 non-coded marks. Based on the
reference value, the final measurement results of the evaluation are listed in Table 5.7,
and the RMS was 0.53 mm.
Figure 5.75 is the visualization display of the implementation and results of
the optimized views photogrammetry antenna reflector measurement. Figure 5.75a
shows the location of the marker points on the reflector of the antenna and the loca-
tion of the camera station collected by the optimized views, in which the green point
is the location of the measurement marker, and the white patch is the location of
the camera station. For comparison, only part of the flight was flown, so the camera
station shown is off to the side of the antenna. Figure 5.75b corresponds to the data in
5.4 Summary 409

Fig. 5.74 Reconstruction results of the antenna reflector

Table 5.7 Statistical results of radio antenna reflector measurements


Statistics d x /mm d y /mm d z /mm M ag /mm
Min. − 0.51 − 0.81 − 1.88 − 2.08
Max. 0.71 0.63 1.15 1.38
Average 0.1 − 0.01 0.06 0.06
StdDev from average 0.16 0.15 0.47 0.53
StdDev from zero 0.19 0.15 0.48 0.53
RMS 0.19 0.15 0.47 0.53

Table 5.7, which shows the distribution of the measurement accuracy of the antenna
reflector. Some of the blocks shown in blue mainly correspond to the camera station
or the antenna reflector with insufficient views. The results show that the use of opti-
mized views photogrammetry technology can fully meet the accuracy requirements
of close-range photogrammetry of large industrial structures under the premise of
maintaining flight safety and consistent resolution.

5.4 Summary

The UAV system is an information data acquisition platform with good maneuvering
characteristics and can combine a variety of sensor payloads. For 3D measurement
application requirements, LiDAR and visual cameras are relatively suitable and
frequently used sensors. Then, with the UAV as the vehicle, UAV laser measure-
ment and UAV photogrammetry technology systems and corresponding processing
methods are formed.
410 5 UAV 3D Measurement

Fig. 5.75 Visualization of optimized views photogrammetry antenna surface measurement

The UAV LiDAR system deeply integrates laser scanning equipment, INS, and
GNSS systems through high-precision space–time synchronization. The processing
of the LiDAR point cloud data collected by the UAV LiDAR system involves multiple
aspects, including point cloud filtering, point cloud intensity correction, point cloud
registration, and point cloud target extraction. The technical methods introduced in
these processing phases include classical methods such as statistical filtering, as well
as emerging technologies such as feature extraction based on deep learning. Through
application practice in multiple business fields, such as terrain mapping, power
inspection, land object classification, and urban change monitoring, the specific tech-
nical route and method adjustment of UAV laser measurement under various scene
conditions are verified and introduced.
Optimized views photogrammetry is an original UAV photogrammetry tech-
nology. Its main feature is that the optimization in the post-processing stage is
pre-positioned to the planning stage, and high-quality aerial imaging acquisition
is achieved through refined pre-measurement analysis. The process of optimized
view photogrammetry is divided into four main stages: the acquisition and construc-
tion of the rough model, the generation of the primary selection of views based on
observation sampling, the optimal selection of the views based on observability anal-
ysis, and the planning of the aerial photography flight route based on the optimized
views. The final output is the flight routes that can be used to drive the UAV aerial
photography system to realize fitness or high-precision photogrammetry applica-
tions. Relying on the optimized views photogrammetry technique, compared with
oblique photogrammetry for the same scene object, the results show that the absolute
accuracy is equivalent, and the relative accuracy is higher. In addition, for a wide range
of scene objects, the division function derived from route planning is to achieve multi-
UAV collaborative mapping and further form a new technology application mode
References 411

for ubiquitous UAV photogrammetry. Through three specific cases, 3D reconstruc-


tion of urban areas, digitization of cultural relics and ancient buildings, and close-
range photogrammetry of industrial structures, it is verified that optimized views
photogrammetry can deal with complex scene objects and provides high-precision
data processing results. At the same time, it can effectively integrate the technical
advantages of aerial photogrammetry and close-range photogrammetry to achieve
high-precision measurement applications of large-scale industrial or engineering
scene objects.

References

1. Li X, Liu C, Wang Z, et al (2020) Airborne LiDAR: state-of-the-art of system design, technology


and application. Meas Sci Technol 32(3):032002.
2. Stanley M H, Laefer D F (2021) Metrics for aerial, urban LiDAR point clouds. ISPRS J
Photogramm Remote Sens 175:268–281.
3. Kalogerakis E, Simari P, Nowrouzezahrai D, et al (2007) Robust statistical estimation of
curvature on discretized surfaces//Proceedings of Symposium on Geometry Processing, Spain.
4. Besl P J, McKay N D (1992) A method for registration of 3D shapes. IEEE Trans Pattern Anal
Mach Intell 14(2):239–256.
5. Zhang Y, Zhai F, Cai S, et al (2021) Leaf feature extraction and 3D reconstruction based on
point cloud data. Chin Measur Test Technol 47(8):6–12.
6. Axelsson P (1999) Processing of laser scanner data—algorithms and applications. ISPRS J
Photogramm Remote Sens 54(2–3):138–147.
7. Wen P, Zhao F, Wu X, et al (2020) Progressive building contours are extracted from raw LiDAR
point cloud data. Bull Surv Map (9):80–84.
8. Su H, Maji S, Kalogerakis E, et al (2015) Multiview convolutional neural networks for 3D shape
recognition//Proceedings of IEEE international conference on computer vision, Santiago.
9. Qi C R, Su H, Mo K, et al (2017) PointNet: Deep learning on point sets for 3D classification and
segmentation//Proceedings of the IEEE conference on computer vision and pattern recognition,
Hawaii.
10. Maturana D, Scherer S (2015) VoxNet: A 3D convolutional neural network for real-time object
recognition//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots
and Systems, Hamburg.
11. Wang P, Liu Y, Guo Y, et al (2017) O-CNN: Octree-based convolutional neural networks for
3D shape analysis. ACM Trans Graph 36(4):1–11.
12. Qi C R, Su H, Mo K, et al (2017) PointNet: Deep learning on point sets for 3D classification and
segmentation//Proceedings of the IEEE conference on computer vision and pattern recognition,
Honolulu HI.
13. Qi C R, Yi L, Su H, et al (2017) PointNet++: Deep hierarchical feature learning on point sets
in a metric space//Proceedings of Advances in neural information processing systems, Long
Beach CA.
14. Ravanbakhsh S, Schneider J, Poczos B (2016) Deep learning with sets and point clouds. arXiv
preprint arXiv:161104500.
15. Klokov R, Lempitsky V (2017) Escape from cells: Deep KD-networks for the recognition of
3D point cloud models//Proceedings of the IEEE international conference on computer vision,
Venice.
16. Nadgowda S, Jayachandran P, Verma A (2013) 12map: Cloud disaster recovery based on
image-instance mapping//Proceedings of ACM/IFIP/USENIX International Conference on
Distributed Systems Platforms and Open Distributed Processing, Beijing.
412 5 UAV 3D Measurement

17. De Geus D, Meletis P, Dubbelman G (2019) Panoptic segmentation with a joint semantic and
instance segmentation network//Proceedings of Computing Research Repositionoy, Keiv.
18. Wang C, Shu Q, Wang X, et al (2019) A random forest classifier based on pixel comparison
features for urban LiDAR data. ISPRS J Photogramm Remote Sens 148:75–86.
19. Lepetit V, Fua P (2006) Keypoint recognition using randomized trees. IEEE Trans Pattern Anal
Mach Intell 28(9):1465–1479.
20. Shotton J, Sharp T, Kipman A, et al (2013) Real-time human pose recognition in parts from
single depth images. Commun ACM 56(1):116–124.
21. Chehata N, Guo L, Mallet C (2009) Airborne LiDAR feature selection for urban classification
using random forests. ISPRS Archives 38:207–212.
22. Zhao Y, Wu B, Wu J, et al (2020) Mapping 3D visibility in an urban street environment from
mobile LiDAR point clouds. GISci Remote Sens 57(6):797–812.
23. Sun W, Ren L, Peng Z, et al (2021) 3D modeling exploration of large-scale city (Shenzhen)
based on tilt photography and LiDAR. Bull Surv Map (S1):10–15.
24. Rato D, Santos V (2021) LiDAR based detection of road boundaries using the density of
accumulated point clouds and their gradients. Rob Auton Syst 138(3):103714.
25. Mao B, Li B (2020) City object detection from airborne LiDAR data with OpenStreetMap-
tagged superpixels. Concurr Comput 32(23):e6026.
26. Shapovalov R, Velizhev E, Barinova O (2010) Nonassociative Markov networks for 3D
point cloud classification//International Archives of the Photogrammetry, Remote Sensing and
Spatial Information Sciences XXXVIII, part 3A, p 103–108.
27. Liu Y, Fan B, Wang L, et al (2018) Semantic labeling in very high resolution images via a
self-cascaded convolutional neural network. ISPRS J Photogramm Remote Sens 145:78–95.
28. Niemeyer J, Rottensteiner F, Soergel U (2014) Contextual classification of LiDAR data and
building object detection in urban areas. ISPRS J Photogramm Remote Sens 87(1):152–165.
29. Zhou X, Xie K, Huang K, et al (2020) Offsite aerial path planning for efficient urban scene
reconstruction. ACM Trans Graph 39(6):1–16.
30. Xiang T, Xia G, Zhang L (2019) Mini-unmanned aerial vehicle-based remote sensing:
Techniques, applications, and prospects. IEEE Trans Geosci Remote Sens 7(3):29–63.
31. Zhang H, Yao Y, Xie K, et al (2021) Continuous aerial path planning for 3D urban scene
reconstruction. ACM Trans Graph 40(6):225–240.
32. Liu Y, Cui R, Xie K, et al (2021) Aerial path planning for online real-time exploration and offline
high-quality reconstruction of large-scale urban scenes. ACM Trans Graph 40(6):240–256.
33. Koch T, Körner M, Fraundorfer F (2019) Automatic and semantically aware 3D UAV flight
planning for image-based 3D reconstruction. Remote Sens 11(13):1550.
34. Corsini M, Cignoni P, Scopigno R (2012) Efficient and flexible sampling with blue noise
properties of triangular meshes. IEEE T Vis Comput Graph 18(6):914–924.
35. Zhang J, Hu A (2007) Method and precision analysis of multi-baseline photogrammetry.
Geomatics and Information Science of Wuhan University 32(10):847–851.
36. Helsgaun K (2015) Solving the equality generalized traveling salesman problem using the
Lin-Kernighan-Helsgaun algorithm. Math Program Comput 7(3):269–87.
Chapter 6
Coastal Zone Surveying

6.1 Overview

The coastal zone is an important geographical space foundation to support the devel-
opment of the regional marine economy, with abundant natural resources and unique
ecological advantages [1]. Under the continuous influence of sea-land interactions
and human activities, the natural environment in coastal areas usually undergoes
complex dynamic changes. At present, worldwide coastal countries with advanced
marine environment detection technology can conduct large-scale geospatial envi-
ronmental surveys within the sea areas under their jurisdiction, and the survey results
can strongly support and serve their own social and economic development and
natural resource protection [2]. Taking China’s coast as an example, the total length
of the coastline is approximately 32,000 km, of which the mainland coastline is more
than 18,000 km long, and the island coastline is approximately 14,000 km. The coastal
zone is one of the most important and active areas for national defense and civil
construction. In the vast coastal zone, the areas with a water depth of less than 50 m
are as large as 500,000 km2 , of which the sea area with a transparency better than 5 m
is not less than 200,000 km2 . Surveying and mapping such a vast area under limited
space–time conditions [3] by only using conventional acoustic detection methods are
almost impossible work considering the overwhelming workload. In addition, for the
near-shore and intertidal zones, due to the periodic influence of ocean tides, there
are problems such as shallow water depth, sediment deposition, and unfavorable
operating environments such as many reefs, which make conventional bathymetry
technology unable to accurately and efficiently complete related operations. There is
a great danger when surveying vessels and personnel entering the area, so there have
been many gaps in spatial and geographical environment data in the near-coastal area
for a long time, which makes it difficult to meet the actual needs of coastal spatial
planning and the development of marine ecological civilization at this stage [4].
Therefore, the development of flexible and effective intelligent coastal zone detec-
tion technology is an urgent task in the current coastal zone surveying and mapping

© Science Press 2023, corrected publication 2024 413


Q. Li, Dynamic and Precise Engineering Surveying,
https://doi.org/10.1007/978-981-99-5942-6_6
414 6 Coastal Zone Surveying

research field, and it is also the main development trend of multiscale geographical
environment monitoring technology in coastal space in the future.
As conventional technologies used on land are hardly adjusted to the special
geographical space environment in coastal areas, remote sensing based on mobile
platforms can achieve a non-contact spatial and temporal geographic data collection,
thereby effectively avoiding the complex environmental influence. Spatial detec-
tion technology uses various mobile carriers, such as ships, aircraft, and satellites,
to provide a dynamic and continuous information collection for coastal areas [5].
Multisensor integration and multisource data fusion provide a wealth of data, which
greatly expands the range of information sources and temporal and spatial resolutions
for different targets and their states. It is of great practical significance to meet the
geographic information collection requirements of different application scenarios in
coastal areas.
In recent years, with the rapid development of cross-interface and multi-platform
coastal geographical environment detection technology, the application of cross-
platform sea-land integrated detection equipment can better solve the related prob-
lems in the process of obtaining topographic and geomorphological data of coastal
zones, tidal flats, islands, reefs, and their nearshore shallow waters. It provides a
more efficient, flexible, and accurate technical means to collect geospatial environ-
mental information and has obvious advantages in describing the characteristics of
seabed landforms in shallow sea areas and building a 3D seabed topographic model.
In addition, intelligent big data processing and analysis solutions based on dynamic
models, crowdsourcing networks, and deep learning also provide an entirely new
perspective for scientific research in the fields of marine economic development and
marine environmental protection, which is the concentrated embodiment of modern
advanced sea-land integrated surveying and mapping technology. This chapter starts
with the development of coastal zone dynamic measurement technology and its
related equipment. First, it expounds on the ship-borne integrated dynamic detection
technology and system integration and analyzes the water-land integration detec-
tion technology and system integration technology scheme. Then, the basic principle
and system development process of airborne bathymetric LiDAR measurement tech-
nology are introduced, and the data processing method and practical effect of airborne
bathymetry LiDAR systems are discussed and analyzed. Next, spaceborne InSAR
coastal surface subsidence measurement technology is systematically introduced.
Finally, the tide level correction technology involved in coastal zone surveying is
explored and studied. The content introduced in this chapter can provide an effective
reference for the dynamic monitoring of changes in the geospatial environment in
coastal areas.
6.2 Shipborne Water-Shore Integrated Surveying 415

6.2 Shipborne Water-Shore Integrated Surveying

6.2.1 Water-Shore Integrated Surveying Technique

Shipborne multi-sensor water-shore integrated surveying is a new technology in


recent years that achieves water-shore integrated seamless surveying by integrating
the underwater multi-beam bathymetry system, the water laser scanning system, the
panoramic image acquisition system, and the POS. By unifying the surveying coordi-
nate system, the issue of terrain alignment caused by land surveying and underwater
surveying can be avoided. The operation efficiency and surveying accuracy can meet
the corresponding specification requirements. Its main surveying techniques include
the following three aspects.
1. Dynamic measurement technology based on mobile carriers
Fast measurement technology based on the mobile platform is the basic requirement
and significant advantage of the water-shore integrated measurement system. By
using the POS equipped on the platform, the real-time position and attitude of the
sensor can be quickly determined. The whole measurement process usually involves
multiple coordinate systems, including the survey vessel coordinate system, the
sensor coordinate system, the station center coordinate system, and the geodetic coor-
dinate system. According to the transformation matrix from the sensor coordinate
system to the survey vessel coordinate system, the survey vessel coordinate system
to the station center coordinate system, and the station center coordinate system to
the geodetic coordinate system, the point coordinates of the sensor are reduced to the
geodetic coordinates, and finally, the coordinate value of the measurement point in
the geodetic coordinate system is obtained. Using the POS, the external orientation
and attitude parameters required for measurement can be obtained without relying
on ground control points. The geographic coordinate reference data can be directly
provided for the integrated measurement system, and with precise time parameters,
mapping without ground control points can be achieved.
2. Multi-sensor measurement data fusion technology
In the design of the water-shore integrated measurement system, the 3D laser scanner
and the multi-beam echo sounder can be used to collect and measure the overwater
and underwater spatial topographic information, respectively. The high-precision
POS can provide positioning information, time information, attitude information,
and heading information for laser scanners, multi-beam echo sounders, and other
measurement sensors. The time synchronization control module can provide a unified
time synchronization reference for integrated measurement data. These data include
sonar bathymetry data, video data, LiDAR point cloud data, attribute data, etc.
Due to different formats, different types, and different geographic references, in the
processing and management of these data, a unified geographic coordinate system
and time stamp can be established for transformation and integration according to
different purposes and data types. The position and attitude information of each sensor
416 6 Coastal Zone Surveying

in the geographic coordinate system can be determined during operation for subse-
quent spatial registration, and various data according to different requirements can
be fused, so effective storage, management, and service of multi-source information
in the spatial database can be achieved.
Since the multi-beam opening angle is generally within 160°, the shipborne
3D laser scanner cannot penetrate the water medium for measurement; therefore,
installing the multi-beam transducer in the normal way will lead to a measurement
“blind zone” in the water-shore integrated measurement results. In view of this “blind
area”, the low ebb tide is used to measure the water intertidal zone, and the high ebb
tide is used to measure the underwater topography. Then, the underwater and over-
water results are mosaicked together, which can greatly reduce the measurement of
the “blind zone” in the intertidal zone.
3. Conversion from unified geographic coordinates to the coastal zone coordi-
nate system
Based on high-precision satellite navigation and positioning technology, the water-
shore integrated measurement system can directly obtain digital geodetic products.
The topographic map of China’s coastal zone adopts the 1985 national height datum,
and the water depth and intertidal zone height are based on the theoretical chart
datum. Therefore, to convert the results of the water-shore integrated measurement
system into the topographic map of the coastal zone, it is necessary to convert the
results of the geodetic coordinate system to the coastal zone coordinate system. The
national height datum in 1985 was mainly based on the multi-year mean sea level
derived from the tidal data of the Qingdao Tide Gauge Station from 1952 to 1979,
and the mean sea level was taken as the zero point. The theoretical chart datum is
calibrated according to the local mean sea level. At present, China uses the 13 subtidal
harmonic constants of local tide gauge stations to calculate the difference between the
theoretical chart datum and the mean sea level. Therefore, it is necessary to calculate
the geodetic height of the survey area’s geoid and the chart datum. Using GNSS to
observe the benchmark points laid out by the tide gauge stations in the survey area, a
relatively accurate geodetic height can be obtained. The geodetic height of the chart
datum needs to be calculated from the geodetic height of the mean sea level. At
the same time, according to the topographic relief value of the measured sea area,
the chart datum and height datum of the tide-gauge station can be obtained, as well
as the conversion relationship between the data based on the chart datum and the
data based on the leveling elevation of the tide gauge station, and the position of the
theoretical chart datum in the national height datum can be achieved. Therefore, if a
large-area water-shore integrated survey is carried out, it is necessary to establish the
mutual conversion relationship between the geodetic height of the geoid, the geodetic
height of the mean sea level, the continuous chart datum model and the sea surface
topographic model (Fig. 6.1).
6.2 Shipborne Water-Shore Integrated Surveying 417

Fig. 6.1 Vertical datum in the coastal zone

6.2.2 Development of a Water-Shore Integrated


Measurement System

The water-shore integrated measurement system integrates many sensors, such as


a multi-beam bathymetry system, laser scanning system, POS, and realizes rapid
mobile measurement of water shore terrain. With the support of high-precision GNSS
positioning technology, the simultaneous measurement of overwater and underwater
terrain is achieved through shipborne survey equipment. It can avoid the incon-
sistency of the result caused by the separate measurement over and underwater,
improve the measurement operation efficiency, and, at the same time, ensure that the
measurement accuracy meets the corresponding specification requirements.
However, the integration of multiple sensors makes the data acquisition and
processing of the entire measurement system more difficult. This is mainly reflected
in the different acquisition frequencies of sensors, different installation positions,
and unsynchronized alignment of acquisition time and space, which may lead to a
decrease in the accuracy of the entire measurement system due to the low perfor-
mance of a certain sensor. Therefore, the selection of various measurement sensors
of the integrated measurement system should be matched, and the measurement
accuracy of all sensors should be set to a reasonable order of magnitude to ensure
the overall accuracy of the system. At the same time, a sophisticated control system
should be equipped to ensure that multiple sensors work together in time and space
and to ensure that measurement data can be effectively fused and processed.
418 6 Coastal Zone Surveying

1. 3D Laser surveying above water


The topography survey above water uses the shipborne 3D laser scanning system to
carry out navigation surveys and uses high-density point cloud data to characterize the
topographic relief above water. The 3D laser scanner quickly scans the measurement
target during the movement, directly obtains the angle and distance information
between the laser point emitted by the scanner and the surface of the measured
target, and automatically stores and calculates. Since the mobile 3D laser scanning
system is moving when scanning the measured target, when processing the scanned
data, it is necessary to uniformly calculate the data at a certain moment and obtain
the 3D coordinates of the target point corresponding to the scanner at this moment.
2. Underwater multi-beam measurement
The principle of single-beam sounding is to use a transducer to transmit short, pulsed
sound waves. When the pulsed sound waves encounter the seabed, they are reflected,
and the reflected echoes return to the sonar and are received by the transducer. Its
water depth is obtained in two ways, ranging from the average speed of sound.

1
Dtr = Ct (6.1)
2
where Dtr is the distance between the transducer and the seabed, C is the average
sound speed of the water body, and t is the two-way ranging time of the sound wave.
In Eq. (6.1), Dtr represents the instantaneous water depth from the transducer to
the seabed, and the transducer draft correction value ∆Dd and tide level correction
value ∆Dt need to be added; that is, the actual chart water depth D is:

D = Dtr + ∆Dd + ∆Dt (6.2)

For multi-beam bathymetry, two sets of transducer arrays with orthogonal transmit
and receive directivity are used to obtain a series of narrow beams with vertical
heading distribution. The multi-beam transducer emits multiple or dozens or even
hundreds of narrow beams perpendicular to the navigation direction to form a fan.
In the acoustic wave fan, only the middle beam is emitted perpendicular to the water
surface, and the outer beams on both sides form a certain angle of incidence with the
vertical plane.
As shown in Fig. 6.2, taking a single-plane transducer with 16 beams and a beam
angle of 2° × 2° as an example, the working principle of multi-beam bathymetry
is briefly described. The multi-beam transducer emits a 2° × 44° fan-shaped pulse
sound wave downward and then receives the echoes from the bottom reflected beam
in the form of a 20° × 2° strip formed by 16 receiving beam angles, thus forming
16 2° × 2° beams. The incident angle θ of the beam needs to be considered in the
calculation of the spatial position of each beam measuring point. Under the condition
of ignoring the first-order approximation of beam ray bending, the water depth Dtr
under the transducer of each beam measuring point and the horizontal position X
from the center point can be expressed as
6.2 Shipborne Water-Shore Integrated Surveying 419

1
Dtr = Ct cos θ + Dd + ∆Dt (6.3)
2
1
X= Ct sin θ (6.4)
2
where C is the average speed of sound, t is the two-way ranging time, and θ is the
incident angle of the co-perpendicular line of the receiving beam. By calculating the
incident angle of the co-perpendicular line of each beam, the coordinate value of
each beam measurement can be obtained.
3. Unification of coordinate transformation above water and underwater
In the shipborne water-shore integrated survey system, the five-coordinate systems
are mainly involved in the whole measurement process, including the coordinate
system of the survey vessel, the coordinate system of the laser scanner, the coordinate
system of the multi-beam echo sounder, the coordinate system of the station center,
and the geodetic coordinate system. Among them, the coordinate systems of the
laser scanner and the multi-beam echo sounder are the sensor coordinate systems
(Fig. 6.3).

Fig. 6.2 Schematic diagram of the multi-beam bathymetry system


420 6 Coastal Zone Surveying

Fig. 6.3 The coordinate systems of the shipborne water-shore integrated system
6.2 Shipborne Water-Shore Integrated Surveying 421

The left origin of the hull ship coordinate system is located at the center of mass
of the IMU, the Yi axis points to the forward direction, the X i axis is perpendicular
to the Yi axis and points to the right side of the forward direction, the Z i axis is
perpendicular to the X i axis, and the Yi axis points upward, forming a right-handed
coordinate system. The laser scanner coordinate system is a right-handed coordinate
system with the vertical line and the scanning plane as the reference. The laser
emission reference point is the origin, the vertical upward direction is the Z s axis,
the carrier advancing direction is the Ys axis, and the X s axis is perpendicular to the
Ys axis points to the right in the forward direction. At this time, the X s O Z s plane is
the vertical plane of laser scanning, and at the same time, four feature points with
known coordinate values at the bottom of the instrument are given. The coordinate
system of the multi-beam echo sounder takes the center of the receiving terminal of
the transducer as the coordinate origin, the forward direction of the transducer as the
Yi axis, along the receiving terminal array plane perpendicular to the Ym axis to the
right as the X m axis, and passing through this point perpendicular to the receiving
terminal array to the connecting flange direction as the Z m axis, establishing a right-
handed coordinate system. The origin of the station center coordinate is located at
the phase center of the GNSS antenna, the Yp axis points to the local north meridian
direction, the X p axis and Yp axis are perpendicular to the east direction, and the Z p
axis is perpendicular to the X p OYp plane to form a right-handed coordinate system.
The geodetic coordinate system is the geocentric fixed coordinate system.
According to the coordinate matching model, the coordinates of the measuring
point in the geodetic coordinate system are obtained. The point coordinates of the
sensor are reduced to geodetic coordinates through the conversion from the sensor
coordinate system to the carrier coordinate system, the conversion from the carrier
coordinate system to the station center coordinate system, and the conversion from
the station center coordinate system to the geodetic coordinate system.
The six parameters of the sensor-to-carrier coordinate system transformation are
l x , l y , l z , ω, ϕ, and κ. In an ideal installation situation, the sensor coordinate system
should be parallel to the three axes of the carrier coordinate system. The deviation
between the axes caused by the non-parallel is the installation error, which can be
obtained by the method of sensor calibration. That is, the calibration parameters,
namely, roll deviation angle, pitch deviation angle, and heading (yaw) deviation
angle, are represented by ω, ϕ and κ, respectively. The translation parameter is the
origin difference between the sensor coordinate system and the carrier coordinate
system, that is, the coordinate value of the origin of the sensor coordinate system in
the carrier coordinate system, which can be measured by a steel ruler or indirectly
calculated after the instrument is installed. Therefore, the sensor coordinate system
can be directly converted into the carrier coordinate system through the above param-
eters. Let the coordinates of the scanning point be [x y z]Ts , and the coordinates in
the carrier coordinate system be [x y z]Tb . then there are
422 6 Coastal Zone Surveying
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
x x lx
⎣ y ⎦ = Rbs ⎣ y ⎦ + ⎣ l y ⎦ (6.5)
z b z s lz b

where
[ ]T
Rbs = R z (ϕ)R x (ω)R y (κ) = R z (−κ)R x (−ω)R y (−ϕ)
⎡ ⎤
cos ϕ cos κ − sin ϕ sin ω sin κ cos ω sin κ sin ϕ cos κ + cos ϕ sin ω sin κ
⎢ ⎥
= ⎣ − cos ϕ sin κ − sin ϕ sin ω cos κ cos ω cos κ − sin ϕ sin κ + cos ϕ sin ω cos κ ⎦
− sin ϕ cos ω − sin ω cos ϕ cos ω
(6.6)

The transformation from the carrier coordinate system to the station center coor-
dinate system. The IMU can measure the real-time attitude of the hull, including
the roll, pitch, and heading angles. These three attitude angles are the Euler angles
between the station center coordinate system and the hull coordinate system. The
roll angle is the angle between the X-axis and the horizontal direction, where clock-
wise is positive. The point position observation in the carrier coordinate system can
be converted into the horizontal coordinate system of the station center through the
attitude angle data obtained in real-time. Let the roll angle, pitch angle and heading
angle be r, p, y, respectively. Let the coordinates of the scanning point in the station
center coordinate system be [x y z]Tl . When the coordinates under the carrier coor-
dinate system are converted to the station center coordinate system, it is necessary
to rotate y around the Z-axis first, then select p around the X-axis, and finally rotate
r around the Y-axis. Then there are
⎡ ⎤ ⎡ ⎤
x x
⎣ y ⎦ = Rlb ⎣ y ⎦ (6.7)
z l z b

where

[ ]T
Rlb = R y (r )R x ( p)R z (y) = Rz (−y)R x (− p)R y (−r )
⎡ ⎤
cos y cos r + sin y sin p sin r sin y cos p sin r cos y − cos r sin y sin p
⎢ ⎥
= ⎣ − cos r sin y + sin p sin r cos y cos y cos p − sin y sin r − cos y sin p cos r ⎦
− sin r cos p sin p cos p cos r
(6.8)

The conversion from the station center coordinate system to the geodetic coordi-
nate system. The longitude and latitude of the origin of the station center coordinate
system in the WGS-84 coordinate system are L and B, respectively. Let the coor-
dinates of the scanning point in the geodetic coordinate system be [x y z]Te . When
converting the coordinates in the station center coordinate system to the coordinates
6.2 Shipborne Water-Shore Integrated Surveying 423

in the geodetic coordinate system, it is necessary to rotate −π/2 + B around the X-


axis first, then rotate −π/2 − L around the Z-axis, and finally translate the origin of
the station center coordinate system to the origin of the WGS-84 coordinate system.
Then there is Eq. (6.9).
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
x x x
⎢ ⎥ e⎢ ⎥ ⎢ ⎥
⎣ y ⎦ =Rl ⎣ y ⎦ + ⎣ y ⎦ (6.9)
z e z p z oe

where
⎡ ⎤
( π ) ( π ) − sin L − sin B cos L cos B cos L
Rel = R z − − L R x − + B = ⎣ cos L − sin B sin L cos B sin L ⎦
2 2
0 cos B sin B
(6.10)

and [x y z]Toe is the space Cartesian coordinate of the origin of the station center
coordinate system in the geodetic coordinate system, thus obtainable.
⎡ ⎤ ⎡ ⎤ ⎛ ⎡ ⎤ ⎡ ⎤ ⎞
x x x px
⎢ ⎥ ⎢ ⎥ e l ⎜ b⎢ ⎥ ⎢ ⎥ ⎟
⎣ y ⎦ =⎣ y ⎦ + Rl Rb ⎝ Rs ⎣ y ⎦ +⎣ p y ⎦ ⎠ (6.11)
z e z oe z s pz b

The above-water and underwater measurement results can be converted under


different coordinates through the above conversion formula, and the water-shore
measurement results supported by high-precision GNSS positioning technology are
consistent with the WGS-84 datum.
4. Shipborne water-shore measurement data processing
The shipborne water-shore measurement data processing mainly includes POS data
calculation, underwater laser data post-processing, and underwater multi-beam data
post-processing. After the field data collection is completed, the calibration param-
eters of the hull need to be estimated first. Then, the high-precision POS trajectory
is solved, and the underwater multi-beam and above-water laser point cloud data
are processed separately. Multi-beam data post-processing using multi-beam post-
processing software was used to analyze the original data format and extract bathy-
metric data, image data, position data, and attitude data. Then, the data were edited,
such as difference elimination, data filtering, sound velocity profile correction, tide
level correction, data merging, and data smoothing. Combined with the position
and attitude provided by the POS and the pre-calibrated coordinate transformation
matrix, the underwater terrain point cloud with absolute coordinates is finally gener-
ated. The above-water data acquisition system extracts the distance, plane angle, and
vertical angle of the massive laser point cloud in real-time. The position and attitude
provided by the POS and the pre-calibrated coordinate transformation matrix are
424 6 Coastal Zone Surveying

post-processed by the software to generate a 3D point cloud with absolute coordi-


nates, and the high-precision terrain point cloud above water is created after point
cloud filtering and thinning.

6.2.3 Application of the Integrated Water-Shore


Measurement System

1. Integrated topographic surveying of island reef and water-shore

At present, the overwater and underwater geospatial information data of islands and
reefs are mainly obtained by separate measurements of above-water and underwater
topography. Overwater topographic surveying on islands and reefs is performed
with the help of 3D laser scanners. The underwater terrain measurement is mainly
completed by bathymetry methods such as single-beam and multi-beam echo
sounders. This separate measurement method of above-water and underwater is time-
consuming, labor-intensive, and inefficient. Moreover, the measurement accuracy of
overwater and underwater is not uniform, and it is difficult to unify the measurement
results in the coordinate system.
Using the shipborne water-shore integrated measurement system equipped with
sensors such as high-definition digital cameras, 3D laser scanners, and multi-beam
sonar, geographic information measurements were carried out on an island and
surrounding sea areas to establish the point cloud model and acquire measurable
360° real scene images. The island topography and underwater terrain information
can be obtained (Fig. 6.4).
The survey vessel keeps a speed of less than 15 km/h to drive around the island.
During the surveying process, the surveying information fixed in the coordinate
system of the subsystem is collected in real-time. The LiDAR point cloud and image
data are stored and managed by using the Beijing 54 coordinate system as the plane
reference of the overwater and underwater point clouds and panoramic images. The
chart datum is the vertical reference of underwater point clouds, and the 85-height
datum is the vertical reference of point clouds and panoramic images over water.
In the mapping process, the 3D coordinates of the point cloud are interpolated and
filtered to generate underwater topographic maps at different scales, topographic
maps and matching panoramic images above water, as shown in Fig. 6.5. Due to the
existence of many aquaculture areas around the island, it is impossible to measure the
terrain obliquely in the intertidal zone near the island during the surveying process.
As a result, the data in the shallow water area are missing, and there is a blind area
after the fusion of overwater and underwater terrain.
In the process of accuracy evaluation, the overwater part is verified by RTK
positioning results, the main corner points of the laser point cloud at 150–300 m are
marked, and the RTK method is used to relocate it. The positioning error is shown
in Fig. 6.6. The horizontal mean square error is 0.16 m, and the vertical mean square
error is 0.13 m. The error mainly comes from two aspects: the accuracy error of
6.2 Shipborne Water-Shore Integrated Surveying 425

Fig. 6.4 Registration and fusion of laser point cloud and panoramic image

Fig. 6.5 Topographic map


of the island and water-shore

the laser point cloud itself, and because RTK positioning is used to verify the main
corner points, RTK positioning itself has errors. The corner points measured by RTK
cannot be absolutely matched with the corner positions of the laser point cloud, and
there is a certain human error. Therefore, when verifying the accuracy of the laser
point cloud, a suitable verification point is selected to reduce the influence of human
error.
The underwater part is verified by single-beam bathymetric results. Single beam
sounding data were collected around the island, and a network was formed. The
multi-beam sounding results are shown in Fig. 6.7a, and the single beam sounding
results are shown in Fig. 6.7b.
426 6 Coastal Zone Surveying

Fig. 6.6 The positioning accuracy of the laser point cloud

Fig. 6.7 The multi-beam and single beam sounding result

Selecting 2701 repeated points of single beam sounding and multi-beam sounding,
it was found that the sounding difference distribution obtained by the two methods
conformed to the Gaussian probability distribution, and the standard deviation was
0.23 m, which was comparable to the sounding error accuracy of the two devices. It
can meet the accuracy requirements of underwater topographic maps [6–8].
2. The positioning measurement of the underwater structure
In recent years, with the continuous development and utilization of marine resources
by the country, the reconstruction and expansion of marine coastal infrastructure
have also been in full swing. During the implementation of the old wharf demolition
project of the 1#–4# berths of the Haixing Wharf in the Mawan Port Area of Shenzhen
Port, the original wharf piles need to be precisely measured. To prevent the impact
of the deep bottom reserved piles on the newly built piles during the construction
of the wharf, it is necessary to accurately calculate the head and tail positions and
slopes of the reserved piles. At present, the measurement technology at home and
abroad cannot accurately measure the straight pile body below the shallow stratum.
Therefore, when calculating the pile head position and slope of the reserved pile
part, it is necessary to calculate the pile direction and the pile head position below
6.2 Shipborne Water-Shore Integrated Surveying 427

the ground surface based on the precise measurement results of the pile above the
ground surface.
In this case, the shipborne integrated overwater and underwater measurement
system was used to conduct high-precision measurement of the berths of Mawan
Port #1–#4, and the point cloud images of the water diversion surface of underwater
and overwater piles were measured by using a laser scanner and multi-beam sounder,
respectively. During the operation, the high and low tide level control measurement
method is adopted so that there is a common mosaicking part in the point cloud
images of the underwater and above-water piles. Based on the absolute coordinates
of the high-precision water surface point cloud, feature stitching of the point cloud
of the water diversion surface of the underwater and overwater pile is performed
to obtain the high-precision absolute position of the pile. Then, according to the
absolute position of the water diversion surface and the size of the pile, the point
position and slope of the part below the mud surface of the retaining pile are accurately
calculated. Through the outer precision verification of 10 feature points at the wharf,
the horizontal absolute accuracy of the pile positioning is 9 cm, and the vertical
absolute accuracy is 8 cm. Finally, comparing the measurement results with the
design results of the new piles, the results show that there is a collision between the
old and new piles in berth #2, and there are 10 new and old piles in berth #4 with a
horizontal interval of less than 10 cm. Due to the limitation of measurement accuracy,
these 10 new and old piles may also collide during the construction process.
First, it is necessary to fix various equipment to the survey vessel and simul-
taneously ensure that each measurement unit is rigidly connected. To ensure the
accuracy of the multi-beam system and the POS, it is necessary to ensure time and
space synchronization between the two systems. Time synchronization can usually
be resolved during hardware integration. Spatial synchronization is due to the instal-
lation error, and there is an installation error angle between the multi-beam transducer
and the POS. It is necessary to accurately calibrate the error angles (roll, pitch, and
heading) to ensure accuracy. The installation error angle is estimated by mosaicking
and least squares evaluation of the multi-beam underwater point clouds obtained from
multiple survey lines. Specifically, roll and pitch errors are estimated by round-trip
survey lines, the partial overlapping of survey lines estimates azimuth installation
errors in the same direction, and survey lines at different speeds estimate time errors
between POS and multi-beam systems.
Due to the 165° coverage angle of the multi-beam strips, the shipborne 3D laser
scanner cannot penetrate the water layer for measurement. Therefore, installing the
multi-beam transducer in the normal way will lead to a measurement “blind zone”
in the integrated water-shore measurement results. For this “blind zone”, there are
generally two operating principles: the first is to use the tide level to control the
measurement, to measure the water intertidal zone at low ebb tide and to measure
the underwater topography at high ebb tide and then stitch the results, which can
greatly reduce the “blind zone” of the intertidal zone measurement; the second is to
raise the horizontal direction of the multi-beam transducer by 30° in the horizontal
direction to obliquely measure the underwater terrain and combine it with the laser
scanner to perform the overwater terrain measurement to achieve seamless coverage
428 6 Coastal Zone Surveying

of the overwater and underwater point clouds, which effectively solves the problem
of “blind zone” measurement. According to the characteristics of the semidiurnal tide
level change in Shenzhen Mawan, the spring tide day of this month is selected. It is
close to the object to measure during flat tide (high tide), and the point cloud of the
pile under the water is obtained by the multi-beam sounder. Panoramic measurements
are performed during low tide (low tide), and the point cloud of the piles above the
water is obtained by the shipborne laser. Since the total length of berths #1 to #4 is
800 m, three measurements are carried out for each operation. The comparison and
fusion of the three measurement results can verify the measurement accuracy and
ensure the measurement reliability.
After the field data collection is completed, the calibration parameters of the
multi-beam system need to be estimated first, then the high-precision POS trajectory
is solved, and the underwater multi-beam and above-water laser point cloud data
are processed. In the post-processing of multi-beam data, the original data format
is analyzed, and the sounding data, image data, position data, and attitude data are
extracted. Then, the data are edited, such as difference elimination, data filtering,
sound velocity profile correction, tide level correction, data merging, and smoothing.
Combined with the position and attitude provided by the POS and the pre-calibrated
coordinate transformation matrix, the underwater sounding point cloud with absolute
coordinates is finally generated. The data of the pile above the water acquisition
system extract the distance, plane angle, and vertical angle of the massive laser
point cloud in real-time. The position and attitude provided by the POS and the pre-
calibrated coordinate transformation matrix are post-processed by the software to
generate a 3D point cloud with absolute coordinates. The final measurement results
are the integrated laser point cloud and underwater terrain point cloud under the
WGS-84 coordinate system, and then the measurement results under the WGS-84
coordinate system are converted into the Shenzhen independent coordinate system.
In the process of accuracy evaluation, the part above water is verified by RTK
positioning results, the feature corner points of 10 laser point clouds in berths #1
to #4 are marked, and the RTK method is used to relocate them to verify the outer
precision of the laser point cloud (Table 6.1).
During the operation of the shipborne water-shore integrated measurement
system, the actual measurement accuracy of underwater multi-beams is not as good
as that of the overwater laser measurement due to the problems of sound velocity
correction, attitude calibration and equipment installation. Therefore, the laser point
cloud over water is used as the base, and the underwater multi-beam point cloud
is stitched according to the elevation value. According to the stitching result of the
berth point cloud and the morphological characteristics of the old pile, the absolute
position of the central axis of the water diversion surface of the pile is extracted.
Then, draw the entire pile course and underwater absolute position based on the
cross-sectional dimensions of the old pile. Finally, the positions of the new and old
piles are compared based on the design position map of the new pile overlaid on the
underwater position map of the old pile. The collision of new and old piles is analyzed
by comparing the position map of the new and old piles in the berth (Figs. 6.8 and
6.9).
6.2 Shipborne Water-Shore Integrated Surveying 429

Table 6.1 The accuracy


Point number Dx /m Dy /m Dz /m
check of point cloud and RTK
coordinates Szu01 0.09 −0.15 −0.01
Szu02 0.08 −0.02 −0.02
Szu03 0.06 −0.08 −0.03
Szu04 −0.04 −0.03 −0.03
Szu05 0.07 0.06 0.23
Szu06 −0.06 0.06 −0.04
Szu07 0.03 0.04 0.02
Szu08 −0.02 0.08 0.02
Szu09 −0.03 0.05 0.01
Szu10 −0.07 0.03 0.06
Mean 0.01 0.01 0.02
Standard deviation 0.06 0.07 0.08

Fig. 6.8 The rendering of the overwater and underwater stitching of pier piles

Fig. 6.9 The effect picture of the new and old pile overlaid
430 6 Coastal Zone Surveying

6.3 Airborne Laser Bathymetric Surveying

6.3.1 Airborne Laser Bathymetry Technology

Water has a narrow transmission window in the visible light band, that is, the blue-
green light band with wavelengths between 470 and 680 nm. Using this band can
not only detect land targets but also penetrate a certain depth of water to achieve
synchronous non-contact detection of overwater and underwater targets. On this
basis, by using laser scanning technology on the aviation platform, integrated water
and land detection of bathymetry LiDAR can be achieved [9].
At present, the signal receiving methods used by ALB systems mainly include
digital full-waveform echo signals and photon counting. Among them, the laser
bathymetry system using the photon counting signal acquisition method does not
need a large laser transmission power, so it can effectively reduce the overall load
and power consumption of the system. The ALB system using full-waveform digital
signals can completely record the response characteristics of the laser signal in the
propagation optical path, which has important application value for target detection
and environmental parameter inversion in complex propagation environments [10]
(Fig. 6.10).

Fig. 6.10 Principles of water depth detection for the ALB system [11]
6.3 Airborne Laser Bathymetric Surveying 431

The principle of ALB technology is to record the time when two echo signals are
received through the timing unit, calculate the time difference of the echo between
the water surface and the bottom of the water, and multiply it by the propagation
speed of the light wave in the water body to obtain the propagation slant of the laser
in the air and seawater media:
ct1
L air = (6.12)
2
cn a t2
L wt = (6.13)
2n w

where L air is the propagation slant distance of the laser in the atmosphere, L wt repre-
sents the slant distance of the blue-green light propagating in the water body, α and
b are the incident angle and refraction angle of the laser on the water surface, respec-
tively c is the propagation speed of the light in the air, t2 is the time interval between
the sea surface echo signal and the bottom echo signal in the waveform, n a and n w
are the refractive indices of the laser propagating in air and sea water, respectively,
and their values vary with the specific conditions of the medium environment.
To ensure that the laser irradiation flux meets the sounding requirements and
improve the integrated detection capability of the ALB system, the system as a
whole should have the following characteristics:
(1) It can dynamically expand the received strength of the echo signal of a wide
range of targets.
(2) The output signal can provide full waveform echo original data with continuous
changes in signal intensity during laser beam propagation and its corresponding
scanning direction data.
(3) The signal acquisition strategy is beneficial to the classification of water and
land echo signals and the calculation of laser propagation direction correction.
(4) It can provide an effective dynamic water surface correction method.
The features above are also the key technical issues for the design and develop-
ment of ALB systems. In addition, the current mainstream trend of system equip-
ment research and development mainly focuses on the multi-mode integration direc-
tion represented by multi-band sensors and the lightweight direction of single-band
compact structure design. Representative commercial ALB systems include the
SHOALS series [12] (which has been upgraded to CZMIL Nova) [13] produced by
Optech in Canada. The HawkEye series [14] was produced by AHAB Company in
Sweden, the LADs Mk3 system was produced by Fugro Company in the Netherlands
[15], and the VQ-820-G and VQ-880-G systems were produced by Riegl Company
in Austria [16].
At present, ALB technology is gradually maturing. However, considering the
complex structure of the system itself, high development costs, technical commercial
protection, and other reasons cause certain restrictions on the popularization and
application of ALB technology. Considering the practical application requirements
in coastal shallow water areas, it is of great significance to develop a low-cost,
432 6 Coastal Zone Surveying

lightweight, and high-efficiency single-band bathymetric LiDAR system to improve


the applicability of this technology.

6.3.2 Development of Airborne Laser Bathymetry Equipment

The ALB system is composed of a variety of equipment and is a marine remote


sensing detection system that achieves the coordinated control and interaction of
each system. The complete ALB system can be divided into an airborne part and a
ground part. The main task of the airborne part is to complete specific scanning detec-
tion and data collection, while the ground part mainly provides support work such
as ground differential positioning information and data post-processing. The main
functional modules of airborne bathymetric LiDAR include the aviation platform,
navigation and positioning system, attitude measurement unit, laser scanning detec-
tion system, synchronous control device, computer control and recording part. The
ground part mainly includes ground reference stations and data processing platforms.
Each functional module is combined with each other through different connection
methods to form an ALB system.
The technical team of the key laboratory of geographical environment monitoring
in the Greater Bay area, Ministry of Natural Resources of Shenzhen University,
adjusted and upgraded the relevant technical performance of the single-band blue-
green laser bathymetry equipment many times after theoretical demonstration and
system design, and a practical airborne full-waveform blue-green laser bathymetry
system, iGreena, was developed and completed in 2019. This part takes the system
as an example to describe the related content of system development.
1. Optical and mechanical scanning structure design
The system adopts the coaxial strategy of the optical path to avoid mutual interference
between the transmitted signal and the echo signal. By adopting a compact and
efficient optomechanical structure, the emitting efficiency of the detection laser is
optimized while reducing the complexity of the structure. As shown in Fig. 6.11 [11],
by punching holes in the corner reflector, the emitted signal directly passes through
the wedge-shaped objective after being split. The precisely machined wedge-shaped
objective lens can maintain the high-precision emission zenith angle when the laser
exits. It is also received by the large-pass light quantity objective lens and enters the
signal acquisition channel through the corner reflector when the echo signal returns
to the system.
In addition, this system is different from the structural design of the reflective
telescope. It utilizes the refraction effect of the objective lens and adopts a rotating
wedge-shaped scanning prism so that the laser emitted by the system during static
scanning forms a complete conical scanning field in space. The laser exit angle is
always kept fixed, which effectively simplifies the calculation process and reduces
the error of the scanning system. When the system is loaded on the flying platform,
6.3 Airborne Laser Bathymetric Surveying 433

Fig. 6.11 Optical coaxial


and isometric scanning
structure [11] 1. Laser
transmitter; 2. Beam splitter;
3. Optical corner reflectors;
4. Rotary scanning angular
encoder; 5. Rotary scanning
motor; 6. Scanning prism; 7.
Receiving channel for laser
emission signal; 8. Receiving
channel for target echo
signal; 9. Scanning nadir
angle; 10. Rotation axis

the flight height of the platform and its running direction can be used to form a
scanning zone on the surface to achieve laser scanning detection of the target area.
In the ALB system, the laser incident zenith angle has a great influence on the accu-
racy of the sounding, and when the incident angle is 20°, the sounding deviation error
caused by the scanning angle is the smallest [17]. Therefore, the optical-mechanical
structure design of the system should consider the target detection effect. In addition,
the ALB system is a complex system composed of multiple system units [18], and a
more compact optical structure is of great significance to realize the miniaturization
and light weight of the system.
2. Signal receiving system and signal amplification channel
According to the single-band small spot (beam divergence angle of approximately
0.438 mrad) laser detection characteristics of the system, in the development process
of the system, the optical path receiving structure is designed to limit the source
direction and wavelength range of laser echoes, thereby weakening the interference
of non-target echo signals such as multi-path echo energy and background light. The
system is demostraed in Fig. 6.12 [11].
434 6 Coastal Zone Surveying

Fig. 6.12 Design of the optical structure of the receiving system [11] 1. Land channel; 2. Beam
splitter; 3. Ocean channel; 4. Narrow-pass filter; 5. Eyepiece; 6. Aperture; 7. Objective; 8. Multipath
light

1) Reception limitation of the echo light signal


Select the original (size) of the large-pass light quantity objective lens for receiving
the laser return signal to ensure the full acquisition of the reflected energy of the target.
Set the field diaphragm at the focal position of the objective lens. After comprehen-
sively considering factors such as the signal-to-noise ratio of photoelectric detec-
tion, the focal length of the telescope objective lens, and the assembly tolerance of
optical components, the diameter of the aperture of the field diaphragm is designed
to be 0.3 mm. This ensures that the target echo signal after passing through the
field diaphragm mainly comes from the field of view direction of the objective lens.
Finally, the received light filtered by the diaphragm is projected on the surface of the
eyepiece and is refracted by the eyepiece to return to the propagation direction of the
beam center.
After being refracted by the eyepiece, an ultranarrow-band filter is set according
to the wavelength of the laser emitted by the system, and the filter selects the beam
according to the working wavelength to eliminate the interference of non-target echo
signals. In the actual process, considering the processing accuracy of the components,
a filter with a bandpass width of 0.5 nm is selected. The filter with this width has a
better filtering effect on the non-target return light exceeding the center wavelength
± 0.25 nm as shown in Fig. 6.13.
To eliminate the flare and polarized light interference in the field of view of the
receiver, before the laser echo signal enters the receiving device, a polarization control
device is also added to further analyze and denoise the incident signal, suppress the
influence of strong ambient light on the quality of the echo signal, and improve the
system’s response to the received target echo signal (Fig. 6.14).
6.3 Airborne Laser Bathymetric Surveying 435

Fig. 6.13 Signal comparison before and after using the diaphragm to limit the surrounding
multipath echo energy (Oscilloscope)

Fig. 6.14 Transmittance


curve of the narrow-pass
filter

The system uses a high-power, short wavelength 532 nm pulsed laser for detection.
When flying at a low altitude, the echo signal of the target closer to the flight platform
and the highly anti-particle in the inhomogeneous medium has a higher echo intensity
than the light signal reflected from the water surface and the shoreline land surface at
a far distance. The high-intensity near-field signal not only has a great impact on the
selection and processing of target waveforms but also may have a destructive effect
on the photomultiplier tube (PMT) detector in a high-gain state (Fig. 6.15).
To solve the problems above, high-precision time synchronization technology can
be combined with PMT variable gain detection technology. When the high-energy
laser pulse is close to the carrier platform, the system PMT works in a low-gain
state, and when the laser pulse is about to reach the water surface/land, it works in
a high-gain state. Therefore, the distance gate can be set according to the altitude
of the flight route to avoid the influence of the near-field interference signal on the
system.
436 6 Coastal Zone Surveying

Fig. 6.15 Propagation with ambient light and atmospheric scattering

2) Echo signal enhancement and dynamic expansion of the detected target range
Aiming at the complexity of the target echo, a dual-channel optical path splitting
design is adopted, and digital technology based on echo multi-level enhancement is
developed. The online storage and processing of digital signals is realized through
circuit design, which can perform multi-stage analog amplification for weak echo
signals with small reflectivity or serious signal attenuation, dynamically expand the
detection range of the system, and ensure the effectiveness of the system detection.
Users are provided with digitized full-waveform echo data containing rich spatial
feature information.
After the target return light reaches the laser bathymetry system, its energy is
usually very weak. In addition to the need to minimize the interference of noise
signals, it is necessary to amplify the signals received by the system in a certain way
to ensure the accuracy and effectiveness of the system in detecting different targets.
During the development of this system, the photoelectric conversion circuit is used
to convert the photocurrent signal into a voltage signal through an impedance trans-
former network, and the voltage is amplified by four independent cascaded amplifier
circuits. The four-way amplifier circuit adopts a symmetrical circuit structure design
to reduce the phase delay of the analog signal between channels. At the same time, the
cross-increasing intensity gain design is adopted to ensure the connectivity between
channels and the dynamic expansion of the detection target range.
As seen in Fig. 6.16a, shows the echo signal received at a water depth of approxi-
mately 12.7 m, Fig. 6.16b shows the echo signal received at a water depth of approxi-
mately 4.5 m. Starting from the original echo signal of channel 4, an analog amplified
signal with increasing intensity is obtained through multi-stage amplification in the
system. When amplifying to the 1st and 2nd channels, the detected signal part in the
channel is in a saturated state, and the weak echoes from the bottom are effectively
amplified. The above are the relevant key technologies and their solutions when the
system collects laser echo signals. The system was tested by the measurement and
inspection agency, and the relevant performance indicators of the whole system are
as follows (Table 6.2).
6.3 Airborne Laser Bathymetric Surveying 437

Fig. 6.16 Multistage amplification of the echo signal in the water area for different depths [11]
438 6 Coastal Zone Surveying

Table 6.2 The related parameters of the equipment


ALB equipment Parameter Equipment completion metrics
Wavelength 532 nm
Laser frequency 10–700 kHz
Scanning frequency 52 Hz
Scan angle ± 20°
Flight height 100–500 m

Elevation accuracy 0.232 + (0.013d)2 m
Horizontal accuracy 0.22 + 5% depth
Weight ≤ 53 kg

6.3.3 Airborne Laser Bathymetry Data Processing

The experimental system adopts the data acquisition and processing technology of
a full-waveform single-band laser echo signal. The echo waveform can intuitively
reflect the attributes of ground objects, combined with the synchronous acquisition
of system position and attitude data. It is of great significance to study the spatial
distribution of specific attributes of specific target areas, such as changes in the
intensity of corresponding echo signals and optical characteristics of ground objects.
The airborne laser underwater topographic survey data processing workflow is shown
in Fig. 6.17.
The blue-green LiDAR underwater detection technology adopts the full waveform
analysis method. In addition to the water surface and bottom echoes, other echo
signals may often be obtained in waveform data processing, including the echoes
of the water scattering layer, the real non-target echo signals such as fishes, and the
noise point cloud generated by miscalculation. Therefore, a reasonable processing
method should be chosen for point cloud data based on product requirements. The
system adopts the method of structured light splitting to realize the reception and
recording of echo signals from land and water. Through waveform data processing,
the laser propagation distance is calculated, and the refraction correction of the spatial
position of the underwater target is calculated. On this basis, after system calibration
and time alignment and fusion of each part of the data, the LiDAR point cloud data
are generated according to the spatial position calculation method.
1. Multi-system data fusion
The ALB system is a comprehensive mobile measurement system composed of
various data acquisition units, such as laser scanning, satellite positioning, inertial
navigation, and flight records. The system can record its overall motion state during
6.3 Airborne Laser Bathymetric Surveying 439

Fig. 6.17 The airborne laser underwater topographic survey data processing workflow

the flight scanning process. Each module can refer to each other through the infor-
mation provided in the data file to achieve unified coordination within the system.
A large amount of observation data is obtained in the meantime, including LiDAR
echo data and scanning status, IMU motion status data, observation data of GNSS
base stations and rover stations, and water environment characteristics. Therefore,
the integrity of the process should be ensured before data processing. Based on this,
the data obtained by each acquisition unit are processed to remove the erroneous data
440 6 Coastal Zone Surveying

contained therein, which provides an important early guarantee for the processing
and analysis of the later observation data.
The airborne LiDAR bathymetry scanning system is composed of multiple
measurement systems. After waveform data processing and modular data acqui-
sition, the basic data obtained by the ALB system include the slant range, scanning
angle of the laser, the real-time spatial coordinates of the GNSS antenna phase center
obtained by the system through differential positioning or precise single-point posi-
tioning, and the real-time attitude data obtained by the inertial navigation unit. In
addition, the ALB system will be systematically calibrated to determine installation
deviations between modules before the scanning bathymetry is performed.
ALB waveform data: The information is obtained by analyzing the full waveform
data, such as the echo waveform, angle encoder, and UTC synchronization time.
The distance/water depth information from the laser emission center to the target
is obtained by detecting the echo waveform and peak value, and the UTC time is
converted into GNSS time.
POS data processing: The POS file is solved to obtain the position, attitude, GNSS
time and other information at any time of navigation, and the timestamp of the camera
aerial photo is obtained at the same time.
The waveform data obtained by the LiDAR and the navigation position and attitude
information are registered according to the GNSS time and linearly interpolated to
obtain the position and attitude information at the corresponding moment of the arbi-
trary waveform. In this way, the fusion of observation data from different systems is
achieved, and accurate observation information is provided for subsequent waveform
data processing and point cloud calculation.
2. Noise suppression for the effective waveforms
Affected by the uncertainty of the system and the medium environment, bathymetric
lasers usually produce a large random noise response during the propagation process.
Therefore, in addition to the effective echo of the target in the echo signal received
by the system, it also contains a large amount of noise interference generated inside
and outside the system, such as the noise current of the laser transmitter and receiver,
the interference from the transmission medium and other light sources during the
propagation process. It is manifested in the abnormal fluctuation of the local echo
intensity in the waveform. According to the response of the target to the laser, the
laser echo signal can be simplified to the following form:

pr (t) = pt (t)wt (t) + n(t) (6.14)

where pr is the power of the received echo, pt is the response function of the trans-
mitted pulse in the received echo, wt is the response of the backscattering cross section
of the target at a small distance, and n(t) represents the random interference during
the propagation of the sounding laser, which is usually represented as Gaussian white
noise.
Because the causes of random noise are complex, it is usually difficult to express
random noise according to certain models and mathematical methods. When the
6.3 Airborne Laser Bathymetric Surveying 441

response power of random noise is too large, it will seriously affect the waveform
data analysis and the accurate calculation of the reflection position, and some valid
information may even be submerged and cannot be recovered. High-quality echo
waveform sequences will help improve the accuracy and reliability of waveform
processing and analysis results. To improve the signal-to-noise ratio of the echo
waveform, the noise input can be reduced by improving the related performance of
the detector, such as improving the current response and using a narrower filter device.
However, it is difficult to directly modify the system equipment in actual measure-
ments. Therefore, it is necessary to use appropriate filtering methods to eliminate
or reduce the adverse effects of random noise in the echo signal before analyzing
the laser full-waveform data. The commonly used methods include Gaussian low-
pass filtering, wavelet noise suppression and low-pass filtering that preserves signal
moments. For the problems of noise and data quality contained in the waveform
signal, data indicators with quantitative characteristics can be used for comparison.
By converting the processed waveform characteristics into specific digital parameters
or quantitative standards, the quality of the signal is evaluated. The commonly used
quantitative evaluation indicators include the MSE, RMSE, and peak signal-to-noise
ratio (PSNR).
3. Waveform data preprocessing
Restricted by the flight altitude of the aircraft and water depth, the echo signal
received by the system contains a large amount of data redundancy in addition to the
intensity change when propagating in the water. Therefore, before echo waveform
data processing, it is necessary to perform preliminary processing on the existing data.
Figure 6.18 shows the received echo signals and their effective waveform positions
under different environmental background conditions. In Fig. 6.18a and c are the
original waveforms of the echo strength and background noise of different bottoms
of the water. The red signal segments in (b) and (d) are effective parts of the original
waveform, and is the effective waveform length [19]. Usually, the length of the
signal recorded by the device will not be lower than the effective waveform length,
including the echo characteristics. Therefore, it is necessary to determine the effective
part of the echo signal first to reduce data redundancy while ensuring that the relevant
processing effects focus on the signal parts that reflect the response characteristics
of the target area’s water surface, water body, and water bottom. The effective part
of the initial signal is intercepted as the main part of data processing, which reduces
the processing length of the waveform signal, thereby improving the data processing
efficiency and reducing the interference of non-target signals.
Because the valid part of the waveform represents the main part of the energy
change in the complete echo signal, the major component of the rest is the high-
frequency signal containing random noise interference. The statistical characteristics
of the signal can be determined by using this feature through the method of drawing
an echo waveform histogram. As shown in Fig. 6.19.
Since the interference of random noise is mainly concentrated in the area with
small vibration amplitude, after fitting the histogram data with a Gaussian function,
the expected value is selected as the mean value of the random noise in the echo data.
442 6 Coastal Zone Surveying

Fig. 6.18 Diagram of the original echo signal and its effective part [19]
6.3 Airborne Laser Bathymetric Surveying 443

Fig. 6.19 Histogram of echo waveform intensity

Meanwhile, the standard deviation of the Gaussian function represents the standard
deviation of the random noise in the echo waveform. In Fig. 6.19, the result of the
Gaussian fitting to the histogram is as follows:
[ ]
−(a − am )2
f i (a) = K exp , f ∼ [K , am , σ ] (6.15)
2(σ )2
[ ] [ ]
where K am σ = 36.877 184.4 2.14 . Using three times the standard deviation
as the reference, the effective waveform selection threshold can be calculated; that
is, when the echo strength p(t) ≥ (184.4 + 3 × 2.14) = 190.8, it is considered to
belong to the effective part of the echo waveform.
Since the front and rear parts of the signal segment do not contain the target
reflection signal, its components are mainly the background noise of the laser pulse
during the propagation process. Therefore, a common simplification scheme is to
select the frame data before and after the waveform data for direct statistics and use
the expected and standard deviation of the obtained data as the mean and standard
deviation of the background noise. Generally, m discrete waveform data that are
independent of the reflected echo intensity of the target can be selected for statistical
analysis.

1 ∑
m
uN = yi (6.16)
m i=1
/
1 ∑
m
σn = (yi − u N )2 (6.17)
m − 1 i=1

Using Eqs. (6.16) and (6.17), the statistical characteristics of the background noise
in the echo signal can be approximated. Since the background noise is mainly random,
it is usually considered to be white noise conforming to the Gaussian distribution.
Generally, three times the standard deviation of the background noise is selected
as the basis for calculating whether the signal contains an effective reflected echo
444 6 Coastal Zone Surveying

signal. When the absolute value of the signal amplitude exceeds the confidence range
of random noise, it is retained; otherwise, it is considered noise interference and
excluded. After screening the entire waveform, the retained echo waveform signal
is the valid part of the waveform.
4. Waveform data processing and target echo detection
The noise-smoothed signal contains the two reflections of the laser on the water
surface and the bottom, as well as the energy changes in the process of penetrating the
water body. The ultimate purpose of waveform processing is to detect the time posi-
tion of the reflection, identify the type of echo, etc. However, due to the uncertainty
of the laser scanning environment, there is no single echo data processing algorithm
suitable for all detection environments in theory. The recording and processing of full
waveform data is the key to ALB technology. It is of great significance to improve
the quality of system data processing to adopt waveform data processing methods
suitable for specific water quality, seabed topographic changing complexity and
other conditions. The current ALB full waveform data processing methods mainly
include waveform deconvolution, echo detection, and waveform decomposition [20].
Conventional waveform processing methods mainly include three types:
(1) Echo detection. Starting from the morphological characteristics of the received
waveform to find the position of the sudden change in energy as the time posi-
tion of the target reflection, such as the maximum peak method, zero-crossing
method, average square difference function method, etc.
(2) Deconvolution. This method is usually applied to image or signal restoration.
Wang et al. [21] compared several conventional methods with the processing
results and found that the Richardson-Lucy deconvolution method has a higher
detection rate and lower error. The disadvantage of deconvolution is that the
anti-noise ability of the method is weak, and the ringing effect is prone to occur
in the processing, resulting in a miscalculation of the time position of the echo.
(3) Mathematical approximation. The reflected signal usually exhibits a morpholog-
ical feature that approximates a Gaussian function. The commonly used Gaus-
sian decomposition methods include hierarchical Gaussian function waveform
fitting based on nonlinear least squares, the Gauss–Newton algorithm, and the
EM algorithm. The processing results obtained by Gaussian decomposition are
more in line with the application requirements of multiple disciplines.
At present, due to commercial confidentiality, most ALB system manufacturers
have not fully disclosed their full waveform data processing methods. For the current
application of ALB technology in the marine field, there are the following problems
in waveform data processing.
The ALB system achieves data acquisition by discrete sampling of the received
echo signal strength according to a fixed frequency, but there is considerable redun-
dancy in the actual obtained full waveform data. During the operation, it is necessary
to lock the effective part of the echo to reduce miscalculation and improve the calcu-
lation efficiency. In addition, the noise interference contained in the waveform data
also adversely affects the accurate determination of the reflection time, so it is better
6.3 Airborne Laser Bathymetric Surveying 445

to perform smoothing filtering according to the echo characteristics before analyzing


the original waveform data.
Directly using discrete sampling points usually cannot accurately determine the
time position of the sudden change of echo energy, and there is a certain error
between the simple interpolation result and the actual reflection time. The math-
ematical simulation method can fit the echo waveform as a continuous curve and
then accurately obtain the time position and the time interval between the peaks
through peak detection.
The echo signal is not a regular Gaussian function. When the energy of the reflected
signal on the bottom of the sea is low or the water depth is shallow, the echo signal
often overlaps, causing the position of the detected bottom wave crest to shift. The
least ideal situation is that the bottom reflection signal is completely embedded in the
echo signal, which complicates the estimation of water depth. Therefore, a reasonable
component screening mechanism is of great significance.
In view of the above problems existing in full waveform processing, the waveform
decomposition method is introduced by taking the waveform processing method of
Gaussian half-wavelength progressive decomposition based on a time sequence as an
example. The Gaussian half-wavelength progressive decomposition method based on
the time sequence can not only obtain the characteristic parameters of each reflection
component but also effectively overcome the inaccuracy of waveform signal fitting
and peak detection, the unreasonable decomposition of reflection components, peak
position offset, and miscalculation caused by superposition.
Full waveform recording of echo signals is the basis of ALB technology. The
full waveform data reflect the spatial and physical properties of the target through
the distribution characteristics and related parameters of the echo energy. Therefore,
extracting the characteristics of the reflected target from the waveform data is the main
purpose of airborne LiDAR measurement. Waveform decomposition is an important
method for target feature extraction. The common feature of the decomposition
method is that the analysis of the reflected echo is realized by decomposing the orig-
inal waveform into waveform components in different spatial domains. At present,
Gaussian decomposition has been widely used in full waveform data processing and
analysis. The echo waveform of the LARSEM500 ALB system was simulated using
the exponentially corrected Gaussian function:

pn (t) = yEMG (t) + yg (t) + n(t) = f 1 (t) f 2 (t) + yg (t) + n(t) (6.18)

where yg (t) means that the bottom reflection echo can be simulated with a Gaus-
sian waveform, but under different bottom sediments or complex underwater terrain
conditions, there may be multiple yg (t) cases. yEMG (t) is the water surface reflection
and water delay, f 1 (t) is the water surface pulse echo, which can be represented by
a Gaussian function, and f 2 (t) means that the attenuation of the water body can be
simulated by an exponential function. n(t) is random noise, usually expressed as
Gaussian white noise.
According to the analysis above, the echo waveform can be considered the super-
position of the echo signals obtained from different reflection sections under the same
446 6 Coastal Zone Surveying

spot and Gaussian white noise. The returned echo pulse energy basically follows the
Gaussian distribution. Full waveform data can be represented as follows:


m ∑
m [ ]
−(t − u i )2
P(t) = f i (t) + n(t) = Ai exp + n(t), n(t) ∼ N (0, σ 2 )
i=1 i=1
2σi2
(6.19)

where f i (t) is the time corresponding function of each component


[ of the full
] wave-
form data, n(t) is Gaussian white noise, and the three parameters Ai μi σi of each
Gaussian function are unknown, which represent the amplitude, time, and waveform
width of the returned waveform of each reflection section, respectively. The essence
of using Gaussian function decomposition is to determine the parameter value of
each echo component.
1) Determination of the initial value of the Gaussian component parameters
The initial value estimation of the parameters in the waveform decomposition is the
basis of the whole waveform analysis. Its main purpose is to determine the number,
time position, echo intensity
[ and waveform] width of the echo components as accu-
rately as possible, namely, Ai0 μi0 σi0 , where i represents the order of the compo-
nents. The precise initial value of the parameters plays an important role in ensuring
the waveform decomposition accuracy and improving the data processing effect.
Therefore, accurate initial value estimation can not only speed up the convergence
process of the system and improve the efficiency of the decomposition operation but
also avoid the situation in which the algorithm falls into the local optimal solution
[19].
2) Waveform component parameter optimization
In general, it is very unlikely to directly obtain higher-precision waveform compo-
nents through initial value estimation. The parameter optimization method is an
algorithm that adjusts the parameters of the components according to a certain stan-
dard based on the first decomposition of the waveform until the fitting accuracy
of the waveform is satisfied. In the single waveform decomposition, the parameter
optimization of the echo component is the key to ensuring measurement accuracy
and is an important part of waveform decomposition. The optimization method with
good performance is of great significance to improve the detection accuracy and accu-
rately reflect the spatial and physical characteristics of the target. The main difference
between the common laser full-waveform data decomposition and its parameter opti-
mization methods lies in the different understanding angles of echo waveforms, so
the corresponding decomposition methods and parameter optimization methods are
different. The most commonly used parameter optimization methods are the least
squares method and the probability distribution method. The least squares method
intuitively expresses the full waveform data as a time domain function, and the most
commonly used method is the nonlinear damped least squares method (Levenberg–
Marquardt, LM). The probability distribution method understands the waveform data
6.3 Airborne Laser Bathymetric Surveying 447

as the probability density of photons received by the sensor over a period, and the
moment of greater intensity corresponds to the greater probability of receiving the
echo energy, which is a representative expectation–maximization method and Monte
Carlo method (reversible jump Markov chain Monte Carlo, RJMCMC).
3) Selection of reflected echo components
The laser echo waveform is not a regular Gaussian function curve and usually exhibits
a “tailing” feature that decays with time. Due to the influence of background noise and
energy loss during the propagation of the laser pulse, the time position of the reflected
echo component is often miscalculated. Therefore, it is necessary to properly filter,
combine, and reduce the decomposed results to obtain a more accurate ranging result
reflecting the real laser pulse.
Considering the attenuation phenomenon of the echo pulse energy and the super-
position of weak waveforms, the position where the partially reflected echo occurs
should be the moment when the echo energy suddenly changes during the energy
attenuation process. Therefore, based on Gaussian half-wavelength progressive
decomposition, the method of obtaining the second derivative of the component peak
sequence is used to extract the reflection components superimposed in the “tailing”
part. In addition, by setting the following constraints on this basis, the pre-selected
components are further screened to obtain the final result (Fig. 6.20).
5. Underwater laser point cloud solution

When the laser propagates from the air to below the water surface, affected by the
refraction at the incident position of the air–water interface, the laser propagation
direction changes from O A to O A' . In addition, the propagation speed of the laser
in water is also different from that in air. Therefore, the calculation of the underwater
point cloud position must consider the refraction process of the laser in the water.

Fig. 6.20 The “tailing” of the waveform in the two echoes [19]
448 6 Coastal Zone Surveying

The following introduces the specific calculation method of underwater point cloud
refraction correction according to the idea of vector calculation.
According to Snell’s law:

sin a n1
= = nw (6.20)
sin β n2
c n1
= = nw (6.21)
v n2

where α and β are the incident angle and refraction angle in the wave state, respec-
tively, n 1 and n 2 are the refractive indices of the laser in water and air, v is the
propagation speed of the laser in water, and c is the propagation speed of the laser
in air.
Compared with the influence of the change in the incident direction, the differential
change in the refractive index itself, not in the inhomogeneous medium, has less
influence on the result, and the subtle spatial changes in n 1 and n 2 are not considered
here to simplify the analysis. The propagation change of the laser in seawater is
mainly reflected in the change in propagation speed and propagation direction. If the
time interval between the water surface reflection and the water bottom reflection
is obtained through waveform processing, let the underwater slant range without
refractive index correction be |F 1 |, and the corresponding slant range after refractive
index correction is |F 2 |. According to the positional relationship in Fig. 6.21, the
calculation formulas of the two are:

|F 1 | = |O A| = ct (6.22)

| |
|F 2 | = | O A' | = vt (6.23)

The above relationship can be expressed as:

|F 1 | = n w |F 2 | (6.24)

The angle of incidence can be obtained by calculating the angle between the
normal vector and the incident ray. When the laser pulse enters the seawater from
the air, the fluctuation of the sea surface will cause a change in the normal vector of
the seawater surface, which will affect the propagation direction of the laser pulse in
the seawater and cause the displacement deviation of the seabed laser point. Here,
the incident normal vector is v, and the normal vector corresponding to the calm sea
surface is v. For the convenience of calculation, set n = [0 0 1].

−nL
cos a = (6.25)
|n||L|
6.3 Airborne Laser Bathymetric Surveying 449

Fig. 6.21 Refraction of laser under sea surface fluctuation conditions

/
/
1 − cos2 β
cos β = 1 − sin2 β = 1− (6.26)
n 2w

A diagram of the laser refraction process is shown in the figure. If F 2 = f 1 + f 2 ,


L = l 1 + l 2 there is the following relationship:


⎪ l 1 = L + n |L||n|cos a


⎨ l 2 = −n|L| a cos
|n|
= |F 2 |L (6.27)

⎪ f 1 |L|n w /


⎩ f = −n|F| 1 − 1−cos2 a
2 n 2w
( / )
L 1 − cos2 α
F2 = −n 1− |F 2 | (6.28)
|L|n w n 2w

When the laser pulse enters the seawater from the air, the fluctuation of the sea
surface will cause a change in the normal vector of the seawater surface, which
will affect the propagation direction of the laser pulse in the seawater and cause the
displacement deviation of the seabed laser point. Therefore, the key to accurately
obtaining the incident direction of the target beam is to determine the degree of
inclination of the local water surface at the incident position in 3D space, calculate
the normal vector v of the incident position of the fluctuating water surface, and use
this instead of n in the formula to calculate F 2 . Then, according to the spatial position
corresponding to the incident on the water surface, the position of the underwater
detection point is obtained.
450 6 Coastal Zone Surveying

6. Sea surface fluctuation correction


When solving the area scanning point cloud for the [ first]time, because there are no
water surface fluctuation data, usually only n = 0 0 1 can be assumed when the
first solution is performed according to the laser point cloud calculation model above.
However, under the action of sea surface waves, the incident surface of the laser is
usually not flat. Therefore, the point cloud data of the local wave surface are used to
obtain the incident position of the laser beam and its adjacent points. Based on this,
a tangent plane in the sense of least squares is simulated to reflect the inclination
characteristics of the sea surface in a small area, which is the core of the surface
wave state simulation. The tiny plane can be represented as follows:


k
M(n, l) = argmin (n pi − l) (6.29)
i=1

where n is the normal vector and l is the distance from the plane m to the coordi-
nate origin. To weaken the influence of the point cloud position error, the center of
gravity position of the neighboring point cloud is used as the tangent position of
the corresponding reflection plane. The principal component analysis (PCA) method
[23] is used to estimate the incident normal vector n of the wave surface, and then
it is imported into Eq. (6.29) to realize the correction of the incident direction of the
underwater laser point.

6.3.4 Airborne LiDAR Bathymetry Application

To verify the actual scanning detection capability and data processing effect of the
system, in January 2020, the R&D technical team conducted a flight scanning test of
the whole system on Wuzhizhou Island in the eastern sea of Hainan Island. Several
experimental tests have been carried out on the detection capability of the system
under the conditions of different water environments near the seashore. The system
is an important breakthrough in the field of domestic research and development of
this technology.
1. The system and the survey area overview
The experimental area is located on Wuzhizhou Island in the southeast of Hainan
Island. The eastern, southern and western coasts of the island are dominated by
bedrock-type coasts, with deep water and rapid changes in underwater topography,
and the bottom of the seabed is mostly reefs. The northern part is a sandy coast with
shallow water and gentle changes in underwater topography, and the bottom is fine
sand with a reflectivity greater than 15%. After the water depth survey, it was found
that the shallow water area in the southern part of the island is narrow and long, the
average width from the coast is approximately 50 m, and the water depth within the
range greater than 50 m changes greatly. The west is second, with an average width
6.3 Airborne Laser Bathymetric Surveying 451

of about 80 m. The average width in the north is 250 m, and the widest point is
approximately 430 m (Fig. 6.22).
The fixed-wing aircraft Cessna 208 was used in the experiment as the opera-
tion platform, and the effect after system installation is shown in Fig. 6.23. Before
the flight test, the system was installed, tested, and calibrated. The iGreena system
adopts the circular scanning method of the rotating objective lens, and the sampling
frequency of the echo signal is 1.25 GHz, which effectively improves the stability of
the system scanning process and the detection accuracy of the full waveform signal.
In addition, the system adopts multi-stage amplification technology of a single echo
signal, which can achieve the dynamic expansion of target detection in the coastal
zone and improve the detection performance of the system for targets with different
reflectivity.

Fig. 6.22 The situation of Wuzhizhou and surrounding waters

Fig. 6.23 The experimental platform and system installation


452 6 Coastal Zone Surveying

Fig. 6.24 The single beam sounding point distribution in the testing area

2. Comparison of detection effects


To verify the actual accuracy of the system, an HD370 full-digital frequency conver-
sion sounder from Hi-Target was used to measure the single beam during the test.
The experiment compares the coincident depth points between the main survey line
and the inspection line of a single beam. The average water depth of the two test
areas is in the range of 1–12 m, and the comparison result is less than the depth
discrepancy limit of 0.3 m at the coincident point in the specification. The results
were uniformly reduced to the mean sea level within the testing area at that time. The
distribution of water depth points obtained by single beam measurement is shown in
Fig. 6.24.
The underwater terrain surface model is established based on the underwater
terrain detection results of the ALB system, and the corresponding plane positions
of the single beam sounding points are extracted as check points for comparison.
Figure 6.25 shows the scatter plot of the single beam sounding value and the ALB
system sounding result.
From the statistical results of the depth deviation between the underwater point
cloud and the single beam sounding point, it can be seen that the results of the
system water depth detection are similar to those obtained by the acoustic detec-
tion method, and the RMSE of the two deviations is 0.174 m. The results meet
the minimum bathymetry standards of the International Hydrographic Organization
(IHO) for safety of navigation hydrographic surveys at Order la.
By comparing the data in the figure, the water depth value obtained by this system
after data processing is in good agreement with the single beam water depth value at
the corresponding position. There is also good consistency between the two, which
can fully meet the basic requirements of space environment detection in the near
shore areas. After mosaicking the flight data and correcting the system deviation,
the overall underwater terrain model of the scanning detection area can be further
obtained.
6.4 Coastal Surface Subsidence InSAR Measurement 453

Fig. 6.25 Bathymetry comparison between iGreena and the single-beam echo sounder

The analysis above verifies the functional characteristics of the airborne laser
bathymetry system during flight scanning operations, and its final detection results
effectively cover the spatial range of land, water surface, water body and water
bottom. The detection ability is good, and the result is as expected. Figure 6.26 shows
an example of water-land integrated point clouds of Wuzhizhou Island obtained by
the ALB system.
Based on the extensive needs of near shore engineering and economic develop-
ment in coastal areas, the continuous development and improvement of airborne
laser bathymetry technology can effectively improve the ability to collect geospatial
environmental information in the area. In addition, system research based on light
weight and intelligence is still an important direction of the current development
technology in this field.

6.4 Coastal Surface Subsidence InSAR Measurement

6.4.1 Research Status of InSAR Technology

Land subsidence, land collapse and sinkholes have occurred in the coastal area in
recent years, which may be caused by human activities, such as land reclamation
and excavation. These disasters caused great losses to the nation and people, both
economically and socially. Monitoring land deformation and publishing alarms or
alerts to the government and public are crucial to save human lives and belongings.
Precursors such as cracks and fissures occur on walls and buildings, leading to subtle
land subsidence, which can be captured by satellite SAR.
Synthetic aperture radar interferometry (InSAR) has already been proven to be
a powerful technique for deformation monitoring in recent decades [24]. At the
early stage, two or three repeat-pass InSAR processing was prevalent [25]. In this
454 6 Coastal Zone Surveying

Fig. 6.26 Water-land integrated point clouds of Wuzhizhou Island obtained by the ALB system

stage, attempts to achieve accurate results mostly focused on spatial filtering and 2D
unwrapping algorithms [26]. Time- dependent noise could not be addressed due to the
lack of time-series datasets in most areas. With an increasing number of SAR satellites
launched in the twenty-first century, a large amount of SAR has become available,
and an increasing number of studies have turned to using more data (i.e., time series
images) to obtain more precise deformation. Many multi-temporal InSAR (MT-
InSAR) methods, which aim to take advantage of multiple repeat orbit observations
to derive long-term land deformation over a large coverage area, have been proposed
and have evolved. It has gradually become a routine investigation method, especially
in urban areas. The InSAR technique obtains relative measurements, and errors can
be sourced from many factors, such as orbit errors, atmospheric effects, unwrapping
errors and systematic noise.
At present, the developed methods for integrating time-series SAR images can be
mainly divided into two branches [27]: small baseline subset (SBAS) [28, 29] and
permanent scatterers (PS)-InSAR [30, 31]. These two main methods are still very
popular due to their publicly accessible open-source codes (e.g., Stanford method for
persistent scatterers, StaMPS) [32]. Although both methods can deal with the time-
series SAR dataset, their theoretical fundaments differ greatly from each other. We
normally divide the SAR scatterers into PS and distributed scatterers (DS) according
to their statistical characteristics. SBAS treats PS and DS equally. A certain number
6.4 Coastal Surface Subsidence InSAR Measurement 455

of multi-looked interferograms are generated, with temporal and spatial baselines


smaller than given thresholds. Despite the great interest, the loss of spatial reso-
lution and boxcar window within SBAS limits the application at the finer scale.
The contamination of nearby pixels with different backscattering properties may
bias the optimal value to varying degrees, according to the scene complexity [33],
and possibly bear a higher risk of bias influence [34]. PS-InSAR preserves spatial
resolutions for PS targets, on which the phase noise is generally small. Therefore,
PS-InSAR is more popular in urban environments where precise measurements are
expected [35]. Nevertheless, the sparse distribution of permanent measurement points
over natural landscapes presents the main challenge for unwrapping and atmospheric
correction in InSAR time-series analysis [36].
Most recent advances in the InSAR community focus on the integration of PS
and DS to derive reliable and dense time-series deformation results [37, 38]. Early
efforts can be referred to [39, 40], which attempted to estimate the accurate inter-
ferometric phase on DSs from interferogram stacks by exploring the target statistics
and modeling the decorrelation characteristics. Following previous studies [39], we
use the term phase linking to refer to the estimation process of consistent time series
phases from interferogram stacks. SqueeSAR built a complete framework to inte-
grate DS and PS in time-series processing, including the statistical test for statistically
homogeneous pixels (SHP) identification and phase triangulation algorithm (PTA)
algorithm for consistent phase estimation [41]. After that, many works came out to
further refine the steps in the framework [42, 43]. Several algorithms were developed
to improve the speed and accuracy of SHP identification [44]. Robust estimation
tools were implemented to ensure the consistent quality of the coherence matrix
under different statistical distributions [45]. A sequential estimator was proposed
to efficiently process the large amount of SAR images [46]. PCA was applied to
treat the multi-mechanism phenomenon in SAR signals. Eigen- decomposition-
based maximum-likelihood-estimator of interferometric phase (EMI) was proposed
to avoid the drawbacks of the PTA iterative method when facing a nonpositive coher-
ence matrix [41, 47]. The bootstrapping method was proposed to balance the bias in
covariance estimation for low-coherence points [42].

6.4.2 Sequential InSAR Processing Technology

The radar backscatter of one pixel can be regarded as the summation of backscattering
from all scatterers within the image pixel. According to the different scattering types,
image pixels can be mainly classified as PS pixels and DS pixels. For the PS pixels,
the radar reflection is dominated by one stable scatterer and therefore has a very
small phase variance. Conversely, the DS pixels have no dominant scatterer and the
phase varies in a random manner. However, if the scatterers inside the DS pixel have
similar scattering mechanisms, the DS phase quality can be improved by the well-
known phase linking process. Traditional PS time-series processing algorithms can
then be adopted to jointly process DS and PS pixels. Here, we divided the existing
456 6 Coastal Zone Surveying

TS-InSAR algorithms into two groups: one group using PS only and the other using
both PS and DS.
1. MT-InSAR with PS only
The MT-InSAR using PS only is also called PS-InSAR. Many InSAR algorithms
belong to this group, such as StaMPS, spatio-temporal unwrapping network (STUN),
interferometric point target analysis (IPTA), persistent scatterer pair (PSP) and quasi-
persistent scatterers (QPS). Here, we took StaMPS as an example to describe the
general procedure of PS-InSAR.
In StaMPS, the threshold is first set using the amplitude deviation method
combined with phase stability estimates to select PS candidates. Pixels with low
amplitude dispersion have higher temporal coherence. According to this feature, the
number of candidate pixels for phase analysis can be reduced, and the point selec-
tion efficiency can be improved [48]. The amplitude dispersion value is defined as
follows:
σ
Da = (6.30)
μ

where σ and μ are the standard deviation and mean of the amplitude, respectively.
Pixels whose amplitude discrete values are lower than a certain value are selected as
PS candidate pixels for phase analysis.
The PS candidate pixel phase analysis model is as follows:

ψ = W (φd + φa + φo + φθ + φn ) (6.31)

The phase ψ after topographic removal of the interferogram includes the defor-
mation phase φd , the atmospheric effect φa , the orbital error φo , the viewing angle
error φθ and the random noise φn . W (·) is the wrapping operator, used to account
for the original phase truncation within 2π periods. The PS pixel is characterized by
a small φn dominated by the remaining four phase components φd , φa , φo , and φθ .
StaMPS first estimates φn by subtracting W (φd + φa + φo + φθ ) from ψ. Then, use
the initial value of φn to weight each pixel to re-estimate W (φd + φa + φo + φθ ).
After subtracting the new value of W (φd + φa + φo + φθ ), φn can be recalculated.
StaMPS repeats this cycle until φn converges [48].
The distribution of the Gamma index can be used for the selection of PS points
after the phase analysis iteratively converges. The strategy of PS point selection is to
find the gamma threshold of each PS candidate point. If the gamma index obtained
by convergence is above the gamma threshold, the candidate point can be included
in the PS point, and if the gamma index is lower than the threshold, it belongs to the
non-PS point.
For the selected PS point, the phase wraps around the 2π interval. It is neces-
sary to perform phase unwrapping to obtain continuous phase values. The unwrap-
ping algorithm can use the space–time 3D unwrapping method or the traditional 2D
unwrapping method. After phase unwrapping, the spatial viewing angle error needs
6.4 Coastal Surface Subsidence InSAR Measurement 457

to be re-estimated. This estimation is different from the parameter space search of PS


selection points, and the spatial angle error phase is no longer a nonlinear winding
value after unwrapping, so it can be estimated by a linear system [49].
⎡ ⎤ ⎡⎤
1 B⊥,1 t1 ψ1
⎢1 ⎥ ⎢ ⎥
⎢ B⊥,2 t2 ⎥⎡ ⎤ ⎢ ψ2 ⎥
⎢. .. .. ⎥ m ⎢ . ⎥
⎢. ⎥ ⎢ ⎥
⎢. . . ⎥⎣ ⎦ ⎢ .. ⎥
⎢ ⎥ K =⎢ ⎥ (6.32)
⎢1 B⊥,i ti ⎥ ⎢ ψi ⎥
⎢. .. ⎥
.. ⎥ v ⎢ ⎥
⎢. ⎢ .. ⎥
⎣. . . ⎦ ⎣ . ⎦
1 B⊥,N tN ψN
[ ]
where ψ1 ψ2 · · · ψi · · · ψ N is the unwrapping phase of the PS point, m is the
overall cheapness of the system, K is the relationship between the unwrapping phase
and the spatial vertical baseline, v is the linear rate of co-estimation, and ti is the time
interval. K B⊥,i is the angle of view error phase. Since m is present in every inter-
ferogram, it is the atmospheric and orbital error of the main image. The atmospheric
orbit error of the auxiliary image is continuous in space and discontinuous in time,
so it can be eliminated by temporal high-pass filtering and spatial low-pass filtering.
2. MT-InSAR with PS and DS
The SAR observation on one scatterer can be treated as a random complex variable.
According to the central limit theorem, the combination of a large number of indepen-
dent scatterers tends toward a normal distribution. Therefore, the SAR observation
of the pixel with one scattering mechanism follows the circular complex Gaussian
(CCG) distribution. Normally, when the statistical properties of a random variable
are known, it is possible to make statistical inferences about the parameters of the
variable based on observations. Phase linking is just a process of statistical inference
that attempts to estimate the time-series consistent interferometric phases of a DS
pixel from its SAR observation samples.
Three steps are required before the phase linking. First, identify the SHP for
each pixel. Second, select the potential DS with an SHP number larger than a given
threshold. Third, calculate the sample complex coherence matrix using the identified
SHPs. After that, phase linking can be implemented to calculate the consistent phase
series from the sample coherence matrix.
According to the CGG assumption, the probability density function (PDF) of the
time series data vector z of a pixel conditioned on the coherence matrix is
( )
f (z) = π − p detG)−1 exp( − z H ΘG −1 Θ H z (6.33)

where
p ∈ N: number of SLC images;
z ∈ C p×1 : the random SLC observation vector along time for a pixel;
G ∈ R p× p : the real-value coherence matrix of the interferograms for a pixel;
458 6 Coastal Zone Surveying

θ ∈ R p×1 : the consistent phase series for a pixel;


Θ ∈ C( p× )p : the complex matrix containing the consistent phase series θ with
Θ = diag eiθ ;
Under the one scattering mechanism assumption, the complex coherence matrix
Γ ∈ C p× p is given by

Γ = ΘGΘH (6.34)

If G is known, the maximum likelihood estimation of the phase is given by


[( )H ( ) ]
θ̂ = arg max eiθ −G −1 ◦ Γ̂ eiθ (6.35)

where Γ̂ is the sample complex coherence matrix, given by

[ ] 1 ∑ H
N
Γ̂ = E zz H ≈ zi zi (6.36)
N i=1

where N is the number of adjacent pixels in the homogeneous patch.


As the true real-value coherence matrix G |is unknown,
| normally the absolute
| |
values of the sample complex coherence matrix |Γ̂ | are taken for replacement. If the
SCM is not positive definite, some additional work on the SCM is needed, such as
inserting a damping factor [45] or including calibration parameters [47].
The commonly used point selection criterion in phase linking methods is the
posterior coherence. The form of posterior coherence γ is given as [41, 47]
( p )
2 ∑ ∑ p
iφik −i(θi −θk )
γ = 2 Re e e (6.37)
p −p i=1 k=i+1

where φik is the interferometric phase contained in the sampled coherence matrix,
Γ̂ .
The potential DS pixels exhibiting high coherence can then be treated as PS
pixels. Their original interferometric phases are replaced by the estimated consistent
phase series. Traditional PS time-series processing algorithms can then be adopted
to jointly process DS and PS pixels.

6.4.3 The Interpretation of Sequential InSAR Results

The final product derived from MT-InSAR processing is the long time-series
displacements of the selected PS targets. After PS processing, major parts of the
phase errors should be removed, while a small amount of noise can still remain.
6.4 Coastal Surface Subsidence InSAR Measurement 459

Conventionally, the linear displacement rate is used to detect whether there is a


subsiding area or uplift area. With the usage of the linear rate, the residual noise is
of little impact, as it can be considered white noise.
However, apart from the linear rate, the intrinsic displacement variation can
provide more information. For example, it tells whether there is a seasonal or periodic
component or whether it contains an abrupt change. Therefore, we propose applying
a post-processing flow to decompose these components. The processing steps are
listed in Fig. 6.27, and are described in the following sections. For each PS target,
there are original InSAR-derived time-series displacements, denoted as x 0 (t), where
the post-processing starts.

1) Interpolation and linear fit

With the InSAR-derived time-series displacements, the linear rate is estimated by


using the first polynomial fitting. Although spaceborne SAR images are usually
acquired at a regular revisit time (12 days for Sentinel-1A), there might be missing
acquisitions, leading to an irregular temporal sampling rate. Therefore, we used linear
interpolation to resample the time-series signals to a regular rate. This time-series
signal is denoted as x(t), where t = 1, 2, · · ·, N , and N is the number of acquisitions
at a regular period.
2) Temperature correlation based on EMD and STL
In this section, the empirical mode decomposition (EMD) and seasonal-trend decom-
position using LOESS (STL) decomposition are combined to decompose the seasonal

Fig. 6.27 Flowchart of postprocessing steps on the InSAR-derived displacements


460 6 Coastal Zone Surveying

variation from the original displacements. In the EMD algorithm, the local minimums
and maximums of the signal are selected and fitted by using spline interpolation,
forming upper and lower envelopes, which are denoted as u(t) and l(t), respectively.
Then, the first intrinsic mode function (IMF) is generated by subtracting the mean
envelope between the upper and lower envelopes from the original signal [29]. This
process is iterated until the IMF is monotonic, and the generated IMFs ci (t), can be
added up to restore the original signal as Eq. (6.38).

ui−1 (t) + l i−1 (t)


ci (t) = ci−1 (t) − envi−1 (t) = ci−1 (t) − (6.38)
2

n
x(t) = ci (t) + r n (t) (6.39)
i=1

where n is the total number of IMFs and r n (t) is the residual component from EMD.
From the above deduction, we can see that the mean envelope can be a good
representative of the original InSAR-derived displacements, as it contains the main
characteristics of the original signal while excluding primary noise. Therefore, STL
decomposition is applied to the mean envelope to derive the seasonal components,
which are denoted as s(t) [31]. We infer that this seasonal component is correlated
with the temperature variation. It should be noted that STL decomposition requires
an input of period.
For different targets, the decomposed seasonal component may or may not be
correlated to air temperature variation. To evaluate their correlation, we calculated
a Pearson correlation coefficient R, between the seasonal component s(t), and an a
priori temperature variation T (t), which is the mean temperature of the day down-
loaded from the National Oceanic and Atmospheric Administration (NOAA) as
shown in Fig. 6.28 [30].

3) Irregular variation detected by ADF test

Fig. 6.28 Air temperature variation at two meteorological stations in HKIA and MIA
6.4 Coastal Surface Subsidence InSAR Measurement 461

Time-series signals can be principally classified into stationary and nonstationary.


The stationary time series manifests no obvious periodic and trend variation, while
the nonstationary time series contains trend or periodic signals. The ADF test is a
unit root test for stationarity [32, 33]. The null hypothesis for this test is that there
is a unit root. The ADF test outputs a p-value that indicates the probability that the
null hypothesis can be rejected. In general, a p-value of less than 5% means that the
null hypothesis can be rejected, which means that the time series is stationary.

6.4.4 Coastal InSAR Monitoring Application

1. The Hong Kong-Zhuhai-Macao Bridge (HZMB)

Starting from a man-made island (IS4) east of Hong Kong International Airport
(HKIA), the Hong Kong-Zhuhai-Macao Bridge (HZMB) crosses the Pearl River
Estuary (PRE) and connects to the man-made island in Zhuhai and Macao (IS1).
The main bridge of the HZMB comprises three channel bridges, namely, the Jiuzhou
Bridge (JZB), Jianghai Bridge (JHB) and Qingzhou Bridge (QZB) from west to east,
allowing ship travel, and blocked bridges between these three channel bridges. In
addition, the main bridge is connected to an underwater channel of 6.7 km via the
west man-made island (IS2) and is connected to the Hong-Kong Bridge (HKB) via
the east man-made island (IS3). The whole length is 55 km, including the main bridge
of 29.6 km. It is 41.6 km between Terminals in Hong Kong and Zhuhai-Macao.
Construction of the HZMB started on December 15, 2009 and was completed on
July 7, 2017. The average depth of water in this region is approximately 37 m and
is up to 44 m at some points on man-made islands. The construction of man-made
islands was first to remove clays on the sea base and then to employ many concrete
cylinders on the solid sea base (61 cylinders for west and 59 for the east). Twisted
H-shape concrete constructs float around man-made islands to protect them from sea
waves. The sea bottom was excavated in a bowl shape, and the tunnel was built on
it. Later, the tunnel was buried by gravels and clays. This means that the tunnel is
sticking to the sea bottom apart from transitions near the islands.
In this study, we collected 86 Sentinel-1A acquisitions (in the IW mode) spanning
from January 6, 2018, to November 27, 2020. This dataset is processed using the
PS technique with the use of GAMMA [27] and StaMPS [28]. InSAR processing,
including swath and burst selection, coregistration, image cropping and interferogram
formation, is applied to 86 acquisitions of Sentinel-1A radar images with the usage
of GAMMA. PS processing, including PS selection and unwrapping, is realized by
using StaMPS software. The linear displacement rate along the line of sight (LOS)
direction and its standard deviation are shown in Fig. 6.29a and b, respectively.
Close-ups of the white boxes in Fig. 6.29a are demonstrated in Fig. 6.31.
One sudden change in the standard deviation shown in Fig. 6.29b occurs at the
western end of the QZB. This can be caused by an accumulation of unwrapping
error due to the long linear spatial distribution of the bridge. It is very suggestive
462 6 Coastal Zone Surveying

Fig. 6.29 PS-InSAR results of the HZMB bridge

to have stable GPS receivers installed in the west and east man-made islands for
correcting the PS-InSAR derived displacements. Unfortunately, in this study, we
have no corrections for this.
Figure 6.30 demonstrates the displacement map of the main HZMB. Figure 6.30a–
c are three close-ups showing displacements of the JZB, the JHB, and the QZB.
The displacement rate ranges between −4 mm/year and + 4 mm/year across these
three bridges, indicating their stability. A displacement rate of 4–8 mm/year is only
observed at several short segments east of the JHB.
Different from the stable bridges, larger subsidence rates are identified on the man-
made islands. The road bridge connecting Zhuhai city and the Passenger Terminal
suffers a subsidence of 4–8 mm/year, as shown in Fig. 6.31a. Figure 6.31b shows that
the northern cloister of the Zhuhai Passenger Terminal (located on IS1) subsides at
a yearly rate of 4–12 mm, while its eastern side subsides at a yearly rate of 0–4 mm.
Figure 6.31c demonstrates an uneven subsidence of targets on IS4. Some targets on
the eastern part subside at a rate of 8–16 mm/year, but the main building on the island
located on the west side appears to be stable. Figure 6.31d shows the displacements
of targets on IS2. The periphery of IS2 has a slight subsidence rate of 4–8 mm/year,
while the middle part is quite stable. Meanwhile, Fig. 6.31d shows that IS3 suffers a
relatively larger subsidence rate, which ranges from 12–16 mm/year. This subsidence
mainly occurs at the northeastern and southern peripheries of the island. Moreover,
6.4 Coastal Surface Subsidence InSAR Measurement 463

Fig. 6.30 The main HZMB and three bridges: a JZB, b JHB and c QZB

Fig. 6.31 Several key points on the HZMB bridge which show large displacement velocities

a subsidence rate of 12–18 mm/year can also be observed at the eastern end of the
central road on the island.
For every PS target, the decomposition of seasonal change and its correlation with
temperature change is carried out. The correlation coefficient map of the whole scene
is shown in Fig. 6.32a. Figure 6.32b shows that a positive correlation occurs on the
west side of the building roof and a negative correlation occurs on the east side of the
roof. On the other hand, the typical pattern of temperature correlation of the bridge
segment connecting to Hong Kong is shown in Fig. 6.32c. Along this segment, there
is a high frequency alternation between positive and neutral correlations. Different
464 6 Coastal Zone Surveying

Fig. 6.32 The correlation map between temperature and the seasonal component of time-series
displacements

from this bridge segment, this high-frequency alternative correlation is not obviously
observed in the main HZMB.
The decomposed seasonal trend and linear trend are removed from the original
time series, and then we apply the ADF test to the residual signals. We chose three
examples and plotted the original signals and the residuals after removing linear
and seasonal components in Fig. 6.33. For point A1, which is located at the northern
cloister of the Zhuhai Passenger Terminal, there is a jump in the time-series displace-
ment around March 2020. This jump can also be observed at A2, which is located
southeast of the Macao Passenger Terminal. For point A3, a change from continuous
subsidence to uplift occurs around August in 2019.
This study attempts to analyze PS-InSAR derived long-term displacements by
using time-series analysis. First, during PS-InSAR processing, we found that VV
polarization is superior to VH polarization in terms of selecting more PS targets.
However, VH polarization is more reliable than VV polarization in suppressing the
side lobe effect. In city areas, there may be many strong scatterers that shadow
adjacent targets. Whether this would affect PS selection and the following processes
should be further studied in the future.
6.4 Coastal Surface Subsidence InSAR Measurement 465

Fig. 6.33 The displacement of three selected points

Second, we derive a displacement map covering the HZMB and man-made islands.
The results show that the main HZMB is quite stable, while the periphery of IS2
subsides and IS3 subsides faster than IS2 as larger subsiding rates of 13 and 17 mm/
year are observed. The subsidence of the peripheral man-made islands such as P7
shown in Fig. 6.31c may be caused by the twisted H-shaped concrete constructs
attached to the islands to protect them from sea waves. These constructs could be
heavier when it soaks in sea water or is attached to marine organisms and microor-
ganisms after a long time. On the other hand, the northern cloister of the Zhuhai
Passenger Terminal is also suffering a slight subsidence of approximately 10 mm/
year.
The thermal expansion effect can be obviously observed in the temperature corre-
lation map, as shown in Fig. 6.32. Different targets can react differently to the same
466 6 Coastal Zone Surveying

temperature variation, which has a period of one year. Figure 6.32b shows the roof of
the Zhuhai Passenger Terminal. The temperature correlation ranges from positive to
negative in the near and far ranges. This can be explained by the horizontal expansion
of the roof structure, which is similar to the thermal effect of the spans of bridges
[21]. When the temperature increases in the summer, the targets on the near-range
side move toward the satellite, while those on the far-range side move far away from
the satellite. The same mechanism can occur on the road and spans of bridges. This
is because the road and the bridge span contain concrete blocks, each of which has a
horizontal expansion when the air temperature increases. The low temperature corre-
lation indicates the targets that almost completely react to the temperature variation.
Heuristically, the correlation is obvious when the temperature correlation R, is above
0.6 or below −0.6. However, no obvious temperature correlation is observed in the
main HZMB.
2. Subsidence monitoring in Shenzhen city
Shenzhen, located at the entrance of the PRE, with the fastest modernization and
urbanization in China, is selected as the study area in this paper. Land subsidence,
land collapse and sinkholes have occurred in urban areas in recent years, which
may be caused by human activities, such as land reclamation and excavation. These
disasters caused great losses to the nation and people, both economically and socially.
Monitoring land deformation and publishing alarms or alerts to the government and
public are crucial to save human lives and belongings. Precursors such as cracks and
fissures occur on walls and buildings, leading to subtle land subsidence, which can
be captured by satellite SAR.
In this study, we collected 119 acquisitions of Sentinel-1A data from one path,
and three bursts were selected (one burst from the northern frame and two bursts
from the southern frame). This study area fully covers nine administrative districts
of Shenzhen: Baoan (BA), Guangming (GM), Nanshan (NS), Longhua (LH), Futian
(FT), Longgang (LG), Luohu (LHU), Yantian (YT) and Pingshan (PS). One district,
Dapeng (DP), is not fully covered and included in this study. Western Shenzhen is
highly populated with intensive skyscrapers and old residential flats and apartments,
which are prone to deformations or even cracks. The dataset spans from March
12, 2017, to March 15, 2021. The common image is selected as the one acquired
on January 1, 2019. The three bursts are concatenated together before PS-InSAR
processing.
In this study, the Sentinel-1A dataset was processed with the use of GAMMA and
StaMPS, as shown in Fig. 6.34. After PS-InSAR processing, we obtained the time-
series deformations on these targets. This processing flow consists of the following
steps: ➀ extract bursts from swaths to form single complex look images (SLCs);
➁ mosaic the master acquisition and generate the look-up table, which is used
for georeferencing radar images or transforming radar images to geolocations; ➂
generate simulated radar images using an external DEM; ➃ coregister all acquisi-
tions according to the master acquisition assisted by the simulated radar image; ➄
6.4 Coastal Surface Subsidence InSAR Measurement 467

Fig. 6.34 The processing flow of PS-InSAR

generate differential interferograms between each acquisition and the master acqui-
sitions; and ➅ PS processing, including PS selection, PS weed, phase correction and
unwrapping and removal of spatially correlated error.
The InSAR crowdsourcing annotation platform (www.insarworld.com) is used to
assist in data validation, as shown in Fig. 6.35. The brief concept of this framework is
to publish hazard areas selected from PS-InSAR results to the public on the Internet.
Each of the hazard areas counts as one task that can be responded to by Internet
volunteers, who are supposed to take photos in the hazard area and upload them in
the system. The uploaded photos are checked by professionals who are working in
detecting geologic hazards. With confirmation from the backend administrator and
professionals, the task is completed, and volunteers can receive a financial reward.
The post-analysis step is to decompose the primary trend of the target movement
as shown in Fig. 6.36. First, the time-series displacements are interpolated using one
day as the sampling interval. Then, we applied EMD to the interpolated time-series
displacements. The EMD algorithm decomposes the time-series signal into IMFs,
which are the average of the upper and lower envelopes. The decomposition is carried
out in an iterative way. At each time, the decomposition generates one IMF and one
residual signal, which is the difference between the last residual signal and this IMF.
This iteration stops when the following three criteria are met with three thresholds,
which can be referred to.
468 6 Coastal Zone Surveying

Fig. 6.35 The interface and published task areas demonstrated in the InSAR crowdsourcing
annotation system

Fig. 6.36 Flowchart of InSAR processing and analysis for monitoring subsidence in Shenzhen

During the operation, 135 task areas were selected and published in all nine
districts of Shenzhen. However, no volunteers responded to the tasks in Pingshan
district, which is the furthest district away from downtown; hence, advertisements
about this task may be less reached to the citizens in that district. In the other eight
districts, 119 task areas were responded to by internet volunteers, and 1853 photos
were uploaded in total. Among these photos, 1742 photos were determined to be
6.4 Coastal Surface Subsidence InSAR Measurement 469

useful after inspection by the backend administrator and professionals. The uploaded
photos show real-life scenes that can be associated with land subsidence, such as
cracks on walls or roads (36%), fissures between buildings and land surface (27%),
looseness of pavement (13%), road collapse (18%) and construction sites (6%).
The PS-InSAR-derived linear deformation rate result shows that most parts of the
study area have a small displacement rate between −2 and 2 mm/year. The largest
subsiding area is located northeast of the Yantian port in the Yantian district. Other
areas subsiding at a rate larger than 4 mm/year are distributed sporadically.
From the uploaded photos, we selected eight examples that are located in the
eight districts (except Pingshan) of Shenzhen. Figure 6.37 shows the selected photos
with close-up images of the PS-InSAR-derived linear displacement rates. A large
subsidence at a rate of more than 8 mm/year occurred in the task areas of GM001.
Correspondingly, a collapse of the pavement in this area was reported by volunteers.
In the other two areas, FT001 and LG003, displacement rates between −6 and −4 are
observed. In these areas, uploaded photos record crevasses of the concrete pavement.
Slighter subsiding rates ranging from 2 to 4 mm/year are noted in NS004, LH020
and YT022. In NS004 and LH020, subsidence is detected in two residence areas,
where the buildings are dense but of low heights. Construction of high buildings had
been ongoing near the location indicated by red boxes in Fig. 6.37. Crevasses were
found on the concrete steps, as shown in Fig. 6.37b, and large cracks were observed
between the steps and walls next to them, as seen in Fig. 6.37d. In YT022, the PS
targets are just on the side of buildings whose construction was completed at the end
of 2016. Loose wall bricks were noticed by volunteers and recorded in the photo
shown in Fig. 6.37f. The last example is an area in LHU007, which is quite stable
with displacement rates of less than 2 mm/year. A vertical fissure was observed on
a wall in this area, as demonstrated in Fig. 6.37h.
Figure 6.38 shows the time-series displacements of the locations demonstrated in
Fig. 6.37, in which the risk areas are highlighted by red boxes. The displacements of
the PS targets within the red boxes are plotted as the colorized scatterers in Fig. 6.38
The displacements of the targets are averaged and interpolated using cubic inter-
polation as the black lines. PS-InSAR-derived displacements on PS targets reflect
their slight movements. However, due to atmospheric effects, orbit errors and other
error sources, final time-series displacements may contain other seasonal or periodic
phase components, which are wrongly interpreted as slight movements of the targets.
In Fig. 6.38, the upper panels demonstrate time-series displacements, and the lower
panels demonstrate all the IMFs for the time-series signals of BA009, NS004 and
FT001 as examples. As we can see, the residual of EMD decomposition represents
the trend of the original signals, while IMFs contain higher frequency signals. The
frequency of the signal decreases in higher IMFs. The last IMF contains some slow
change variation of the original signals.
According to the purple lines in Fig. 6.38, (a) BA009 and (c) GM001 have
similar trends, which consist of a stable period followed by a subsiding period and
then a stable period again. The subsiding period of BA009 starts in approximately
November 2017 and ends in approximately January 2020, lasting approximately two
years. The accumulative subsidence is approximately 8 mm in the subsidence period;
470 6 Coastal Zone Surveying

Fig. 6.37 Uploaded photos taken by online volunteers in the eight districts and PS-InSAR-derived
displacement rates overlaid on a Google Earth satellite map

thus, the subsidence rate is approximately 4 mm/year. This is similar in GM001 only
except that the subsiding trend starts approximately half a year later than that in
BA009. The trending signals in (b) NS004 and (d) LH020 are similar. They both
consist of a subsiding period and a slight uplifting period. The accumulative subsi-
dence reaches 8–10 mm, while the uplift is less than 2 mm. The subsiding period lasts
for at least two and a half years, which makes the subsiding rate slightly less than
4 mm/year. On the other hand, a slight uplift followed by subsidence occurs in both
(e) FT001 and (g) LG003. The accumulative uplift reaches approximately 5 mm,
and the accumulative subsidence is almost 10 mm/year. The starts of the subsiding
period both occur between May and December in 2018. (f) YT022 and (h) LHU007
are relatively stable, as the trending lines are linear and have a very small gradient.
Generally, causes of land deformation in urban areas are mainly attributed to
human activities, including groundwater withdrawal, mining activities, building
construction and loading of various infrastructures, when natural disasters are not
a trouble. In Shenzhen, overdraft of groundwater caused fissures, land subsidence
6.4 Coastal Surface Subsidence InSAR Measurement 471

Fig. 6.38 Time-series displacements and the EMD decomposition of the eight examples demon-
strated in Fig. 6.37
472 6 Coastal Zone Surveying

and land collapses. Since the findings of these risks, groundwater withdrawals is
limited in 1384.6 km2 areas and banned groundwater withdrawals in 612.18 km2
areas. Meanwhile, above the land surface, loading of surface infrastructures, such
as buildings, highways, and flyers, can cause land subsidence. Land subsidence in
turn affects the health of these infrastructures, which may lead to cracks, fissures
on walls and even the collapse of road surfaces. Stress to the foundation caused by
the buildings is additive, which means that the construction and existence of build-
ings exacerbate land settlement. The loading effect of buildings on the foundation
increases with the height and density of the buildings.
In this study, we found that collapses are captured in some uploaded photos,
and the location of the photos confirms intensive land displacements, which are
detected by InSAR techniques, such as BA009 and GM001. The curves of the time-
series displacements of BA009, NS004, GM001 and LH020 follow a Poisson curve,
which has been reported to be a post-construction settlement of soft soil. The InSAR
displacement curve in GM001 confirms the Poisson curve, and coincidently, building
construction north of this area is found to be completed around the end of 2016 in
Google Earth, not long before the start of our Sentinel-1A dataset. It should be
noted that the settlement of newly constructed buildings would add stress to the
underground foundation and lead to the occurrence of fissures or cracks onthe old
buildings nearby. Whether this settlement is within the tolerance of the building itself
and dangerous to nearby buildings should be further investigated and analyzed in the
future.

6.5 Coastal Tide Correction

6.5.1 Research Status of Tidal Correction

Bathymetry is the central task of hydrographic surveys, and it is also the most
effective way to make nautical charts and obtain changes in seabed topography.
Bathymetry can provide high-precision channel bathymetry maps and underwater
topographic maps. Accurate underwater terrain data are widely used in coastal engi-
neering, marine military, and marine scientific research. Therefore, high-precision
bathymetric terrain survey technology is an important research direction of marine
geodesy. At present, there are many research methods for bathymetry, including ALB
and satellite remote sensing sounding inversion. Regardless of the method used, most
of the results measured by bathymetry are the instantaneous sea surface height at the
time of measurement, which includes different levels of water level effects caused by
tides at different times. Different from pelagic surveying, in offshore areas, especially
in coastal areas, due to the shallow water depth, the tide becomes the main factor
restricting the accuracy of underwater topographic measurements. In addition, due
to the shallow water effect, the tidal range in coastal waters is also much larger than
6.5 Coastal Tide Correction 473

that in oceanic waters. Therefore, in high-precision coastal bathymetry operations,


water level correction has become an indispensable operation component.
The instantaneous sea surface is affected by the tidal effect and has dynamic
characteristics. To obtain a stable underwater topographic depth value during the
hydrographic survey, the influence of the tide must be eliminated. The process of
normalizing the instantaneous sounding value of the sea area to a stable reference
plane of known depth is called water level correction. Since the accuracy of water
level correction directly affects the accuracy of the final map, all coastal countries in
the world attach great importance to water level correction. Due to the different sea
conditions and technical requirements of different countries, the water level correc-
tion methods used are also different. However, the basic principles are applied to time
interpolation and regional space interpolation based on the water level observation
information or tide forecast values of the tide gauge stations in the sea area.
The discrete tide zoning (DTZ) method was also carried out in manual opera-
tion mode in the early stage. This method assumes that the tidal waves propagate
uniformly between the stations, that is, the same phase tidal time and tidal height
changes are proportional to their distances between the tidal gauge stations. The
method is to set several special geographic polygons in the sounding area and use
the same water level sequence value in each polygon for correction. This results in
discontinuous “jumps” at the boundaries of adjacent polygons. Therefore, the zoning
method is only suitable for areas where the tidal time difference between stations is
small and the tidal range varies uniformly. However, the actual water level includes
depth reference information, tide level information and residual water level infor-
mation. The zoning method assumes that the temporal and spatial variation in the
depth reference and residual water level in the region is consistent with the tidal
level variation. In fact, due to different acting forces, the tidal level and the residual
water level change unequally, which leads to inevitable errors in the application of
the zoning method.
The US Coastal Survey designed the tidal constituent and residual interpolation
(TCARI) method for water level correction in offshore areas. In TCARI, the station
tide level, residual water level and reference deviation need to be calculated separately
from the coastal tide gauge stations. Then, according to the tidal model or harmonic
constant datum published in the sounding area, the tidal level, residual water level
and reference deviation of the station are allocated to the sounding point by the
corresponding interpolation method to correct the sounding water level error. TCRAI
requires a high-precision tidal model or a harmonic constant datum in the bathymetric
area. However, in most actual bathymetry, data from long-term tide gauge stations
cannot be directly obtained, and the harmonic constants extracted from temporary
tide gauge stations along the coast cannot meet the accuracy requirements of TCRAI.
Most of the current water level interpolation methods are based on geometric inter-
polation. When the number of tide gauge stations in the survey area is insufficient,
the spatial pattern of water level changes in the survey area may not conform to the
geometric trend. In addition, since dynamic water level changes are mainly caused by
astronomical tides, some tidal correction algorithms use tidal simulations to obtain
spatiotemporal tidal models under the condition of insufficient tide gauges [50, 51].
474 6 Coastal Zone Surveying

However, in some cases, the residual water level effect of short-term random meteo-
rological factors can lead to short-term water level anomalies, which are difficult to
simulate by tidal simulation methods.

6.5.2 Spatial Structure of Ocean Dynamic Water Level

1. The spatiotemporal dynamic characteristics of ocean water level


Due to the influence of factors such as tides, waves, wind fields, air pressure and
precipitation, the sea surface morphology has always been a dynamic process. Among
them, the gravitational action of the moon and the sun will produce periodic up and
down movements called gravitational tides. The periodic change in the solar radiation
intensity will also cause a periodic change in the meteorological conditions, thus
causing the periodic rise and fall of the sea surface, which is called the radiation
tide. Radiation tides are usually much smaller than gravitational tides and are only
significant during annual cycles [52]. In this book, the part of the seawater height
change caused by gravity tides and radiation tides is collectively referred to as the
tide level, and the difference between high and low tide levels in the same period is
called the tidal range.
Due to the different distances from the moon and the sun in different parts of the
earth, the period and magnitude of tidal changes in different places are also different.
In most areas, the tide level change cycle is approximately half a day, and there are
two ebb and flow movements in one day. There are also some areas where there
is only one ebb and flow movement per day. Tidal movement can be divided into
semidiurnal tides, diurnal tides, and mixed tides according to the law of tidal level
change. For a certain point of tide level change, the magnitude of the gravitational
force also changes periodically due to the moon and the sun. Therefore, the tidal
range at this point will also change periodically, and the maximum tidal range in a
month is called the spring tide. The minimum tidal range is called the neap tide.
In addition to the periodic effects of the moon and the sun, seawater processing
is also affected by aperiodic meteorological changes. Non-periodic meteorological
changes mainly refer to wind and air pressure. The rising and falling part of the
seawater level caused by non-periodic meteorological changes is called the residual
water level, also known as the increase or decrease in water. The time when the
residual water level is caused by a storm is called a storm surge. Therefore, to summa-
rize the law, the change in tide level depends entirely on the relative positions of the
moon, the sun and the earth, and the remaining water level depends entirely on the
change in meteorological conditions.
In marine geodetic surveys, quantitative research on sea surface topography needs
to be based on a specific datum, and the depth datum is often used to calibrate
the sea surface topography. The ocean dynamic water level model refers to the
spatiotemporal dynamic change process of seawater at a given depth datum.
6.5 Coastal Tide Correction 475

2. Spatial structure characteristics of ocean water level


The ocean water level refers to the free sea surface with elevation at a certain point
in space on a specific datum. It represents the characteristics of the vertical water
volume change of seawater at a certain moment. In geodetic surveying, the earth is
often measured in 2 + 1 dimensions, that is, various curved surfaces (ellipsoid, sphere,
geoid, natural terrain surface, etc.) are first specified. Then, the elevation elements on
these surfaces are determined (geodetic height, orthometric height, abnormal geoid,
etc.), and the horizontal coordinates and elevation coordinates are often determined
at different times [53]. In ocean surveying, due to the existence of tides, currents and
waves, the sea surface is in a dynamic change, and the space of ocean water level also
shows the effect of space–time change. Therefore, the 2 + 1 + 1 dimensions must be
used to measure the ocean water level [54]. The surfaces involved in oceanographic
surveys generally include the instantaneous sea surface, mean sea level at different
time scales, ocean geoid, ellipsoid reference surface, satellite orbital surface, depth
datum, isobaric surface, seafloor and shallow (deep) stratum seafloor. Among them,
the satellite orbital plane, geoid, and ellipsoid are the same as the land measurement,
which is the extension of the natural terrain surface of the land to the natural surface
of the seabed. The instantaneous sea level, mean sea level, and depth datum are
unique to oceanographic surveys (Fig. 6.39).

Fig. 6.39 Diagram of the basic spatial structure of ocean bathymetry


476 6 Coastal Zone Surveying

When sounding at any point in the bathymetric area S, since the instantaneous
sea surface and depth datum are all irregular geometric surfaces, the concept of
a scalar field is introduced to represent the distance between the corresponding
surfaces. D(x, y) represents the steady-state depth (water depth in the map) field
of the surveying area, which is defined as the distance from the seabed to the depth
datum.L(x, y) represents the depth datum model, defined as the distance from the
depth datum to the mean sea level. T (x, y, t) represents the tidal level space field of
the surveying area, defined as the distance from the tidal level to the depth datum.
R(x, y, t) represents the residual water level scalar field, defined as the difference
between the tidal level and the instantaneous sea surface; H (x, y, t) represents the
instantaneous depth field of the surveying area, defined as the instantaneous sea
surface to natural distance to the seabed.
Then, the instantaneous sea surface geometry from the depth datum is the ocean
water level scalar field model, which is expressed as:

W (x, y, t) = T (x, y, t) + R(x, y, t) + L(x, y) (6.40)

6.5.3 Dynamic Water Level Correction Method

This section proposes a dynamic water level correction method, which is mainly
based on the construction of a regional high-precision dynamic water level model
for water level correction. This method mainly divides the sounding point water level
into three parts for correction: simulated tide level correction, residual water level
correction and remaining water level correction. The simulated tide level correction
extracts the tide level value of the sounding point according to the tide level simulation
result of the corresponding operation time in the sounding area. The residual water
level is extracted according to the synchronous tide gauge station along the coast and
then allocated to the sounding point using the interpolation method. The remaining
water level is constructed by comparing the simulated tide level with the forecasted
tide level to construct the remaining water level interpolation model and then assign
it to the sounding point.

T j (t) = h j (t) + R j (t) + C j (t) (6.41)

where T j (t) represents the water level correction value of water depth point j at time
t, h j (t) represents the simulated tide level value of water depth point j at time t,
R j (t) represents the residual water level correction of water depth point j at time t,
and C j (t) represents the remaining water level correction of water depth point j at
time t.
6.5 Coastal Tide Correction 477

1. Simulated tide level correction


The simulated tide level correction uses the hydrodynamic model to simulate and
calculate the temporal and spatial variation process of the tide level in the entire
sounding area during the sounding operation time. Then, extract the instantaneous
tide level value h j (t) corresponding to the sounding point to perform the simulated
tide level correction. Therefore, the simulation tide level correction mainly depends
on the tide level simulation accuracy.
In the high-precision numerical simulation of the tide level, the initial condi-
tions and open boundary conditions determine the accuracy of the simulation. In the
initial conditions, water depth and shoreline data are the most important influencing
factors. The water depth and shoreline data are relatively stable and can be used
repeatedly for many years after obtaining high-precision data. The open boundary
water level condition is the largest factor affecting the accuracy of the model. The
data for different places and different time periods are also different. To ensure the
applicability of the water level correction method proposed in this section, global
open data are used for the boundary water level conditions.
In the initial conditions, the shoreline data were provided by the GEODAS-
NG software, which was the global-scale shoreline data released by the American
Geophysical Center. The bathymetric data are based on the official Electronic Navi-
gational Charts (ENC). The ENC data are acquired by the Chinese Naval Navigation
Support Department, with a spatial resolution of approximately 100 m, and the water
depth reference is the chart depth datum. The open boundary is generally set close to
the tide gauge station. In the open boundary condition setting, this section adopts the
Yellow Sea range in the OSU tidal data inversion (OTIS) Regional tidal solution, with
a resolution of 1/30°. In the OTIS mode, Egbert et al. obtained intersection analysis
results from T/P altimeter data and assimilated other data acquisition methods, such
as shipborne ADCP and tide gauge stations. The TPXO global inversion scheme and
the regional inversion scheme have the same amplitude variance in the open ocean,
but the regional tidal inversion scheme fits the actual data better in complex terrain
and shallow water [55]. This model calculation is in the South Yellow Sea region, so
regional tidal inversion data with higher accuracy are used, and the latest version of
the database is used.
2. The correction of residual water level
The earliest research on residual water level began in the 1980s, and foreign scholars
called it ocean signals, observed chaotic data or residual water level [56–58].
Domestic scholars call it the residual water level or abnormal water level [59–61].
Refers to the remainder after removing the tide level and observation errors from the
measured water level.

R(t) = H (t)−MSL − T (t) − ∆ (6.42)

where R is the residual water level sequence, H is the measured water level sequence,
MSL is the mean sea level, T is the tide level, and ∆ is the observation error. The
478 6 Coastal Zone Surveying

MSL of a certain point is a fixed value, and a relatively stable result can be obtained
through a long-term tide level sequence. The observation error ∆ is mainly sourced
from human observation error or equipment systematic error. Generally, the human
observation error will not exceed 5 cm; the equipment system error is usually a
fixed value, and it is shown as a reference deviation in a long-term sequence, which
can be eliminated in the process of solving the MSL. Therefore, the main factor
affecting the solution accuracy of the residual water level is the solution accuracy
of the tide level T. Generally, both MSL and T can be solved by harmonic analysis.
The extraction accuracy of the residual water level is mainly related to the accuracy
of the tidal harmonic parameters, regardless of the influence of non-tidal factors and
observation errors. The residual water level is the error caused by the error of the
tidal harmonic analysis parameters to the forecast tide level.
An algorithm for obtaining the harmonic constants of each subtidal using tidal
harmonic analysis. Using the obtained subtidal harmonic constants, it is also possible
to perform tidal forecasting at any time.


m
[ ]
T (t) = MSL + f j H j cos σ j t + (V0 + u) j − g j (6.43)
j=1

Without considering the water level observation error, Eq. (6.43) is brought into
Eq. (6.42).

R(t) = H (t) − MSL − T (t) (6.44)

where m is the number of tidal divisions and f is the node factor for forecasting the
daily tidal divisions, (V0 + u) is the GMT initial phase angle of the forecast diurnal
tide, and MSL is the height of the annual mean sea level or multi-year mean sea level.
When using enough observation data to analyze the tidal harmonic parameters, the
obtained tidal harmonic parameters are stable and reliable, and the residual water
level obtained in this part is also relatively stable.
In water level correction, the residual water level is one of the important factors
affecting its accuracy. Since the formation of the residual water level is mainly related
to meteorological factors, the changes in meteorological conditions on a long time
scale are random, and the corresponding residual water level is also contingent on
a long time scale. However, on the shorter time scale of several hours, the meteoro-
logical conditions often have continuity, and relative to the residual water level, they
also show continuity on the small-scale time scale. The action range of short-term
meteorological factors is generally in the range of 10–1000 km, and the action time
is generally in the range of 1–100 h [62]. Therefore, the variation in the residual
water level in the time domain has weak statistical regularity. The residual water
level correlation at the same point on different time scales is called the residual water
level autocorrelation. However, there is a strong correlation within a certain range of
meteorological conditions (mainly referring to air pressure and wind field), so under
the same or similar meteorological incentives, the residual water level must have a
6.5 Coastal Tide Correction 479

certain correlation in the spatial domain. The correlation is the statistical regularity
of the residual water level in the spatial domain. This residual water level correlation
at different points at the same time is called residual water level spatial correlation.
The spatial correlation of the residual water level is the theoretical basis for a series of
applied studies, such as the implementation of residual water level interpolation and
the establishment of spatial and temporal distribution models of the residual water
level.
The research area of the dynamic water level correction method in this section is
Haizhou Bay and its outer sea. The area is distributed in three actual measurement
stations, and the stations are within 50 km away from each other. The area is small,
the tidal wave spreads evenly, and the residual water level sequence of unknown
points can be interpolated by simple distance. Equation (6.45) can be used to extract
the residual water level of each station. Based on the residual water level between
the three stations, the distance interpolation method is used to determine the residual
water level value of the sounding point.


n
l −2
ji
R j (t) = λi j [−Ri (t)], λi j = ∑n −2 (6.45)
i=1 i=1 l ji

where R j (t) is the correction value of the residual water level of water depth point
j at time t, λi j is the correlation coefficient between tide gauge station i and water
depth point j, Ri (t) is the residual water level of tide gauge station i at time t, l −2
ji is
the straight-line distance between water depth point j and tide gauge station i, and
n is the total number of residual water level observations.
3. The correction of the remaining water level
The remaining water level error refers to the error between the simulation result
after the residual water level correction and the measured result, including the simu-
lation deviation and the residual water level extraction error (observation error is
not considered). Among them, the simulation error is the main component. The
remaining water level is mainly caused by the deviation of the tide level simulation
and includes the residual water level that has not been eliminated. The extraction
accuracy of the residual water level mainly depends on the duration of the measured
water level. The water level observation time in this section is one month, so the
influence of the residual water level accuracy is not considered. The simulation bias
is caused by the parameter settings of the model during the astronomical tide simu-
lation process, including the initial condition error and the open boundary condition
error. The initial condition error is mainly caused by the accuracy of the used water
depth and coastline, and the open boundary condition error is mainly caused by the
accuracy and number of tidal divisions in the forced water level. Therefore, this
simulation error cannot be represented by an exact functional model. No functional
model is defined to represent the simulation bias. Therefore, the simulated tide level
and residual water level are excluded from the measured water level.
480 6 Coastal Zone Surveying

E i (t) = Hi (t) − h i (t) − Ri (t) (6.46)

where E i (t) is the remaining water level sequence of tide gauge station i; Hi (t) is
the measured water level sequence of tide gauge station i; h i (t) is the simulated tide
level sequence of tide gauge station i; and Ri (t) is the residual water level sequence
of tide gauge station i.
For the remaining water level, this section designs a distance interpolation method
that integrates the deviation correlation.

ρ −2
C j (t) = μi [−E i (t)]λi j , μi = ∑n ik −2 (6.47)
i=1 ρik
∑ N ( )( )
t=1 E i (t) − E i (t) E k (t) − E k (t)
ρik = / (6.48)
∑ N ( )2 ∑ N ( )2
t=1 E i (t) − E i (t) t=1 E k (t) − E k (t)

where C j (t) is the residual water level correction value of sounding point j at time
t, μi is the water level weight coefficient of tide gauge station i, λi j is the weight
value of tide gauge station i at water depth point j, E i (t) is the residual water level
value of tide gauge station i at time t, ρik is the correlation coefficient between tide
gauge station i and the remaining water level of tide gauge station k, E k (t) is the
residual water level value of tide gauge station k at time t, and n is the total number
of residual water level observations.

6.5.4 Dynamic Water Level Correction in the Southwestern


Yellow Sea

1. Water level simulation and correction

The study area in this section is 119°42' E–120°00' E,35°78' N–34°41' N, which is in
the southwest of the Yellow Sea and is a famous Yellow Sea landform. It gradually
becomes shallower from south to north; the terrain on the east side is steep and is
gentle on the west side. In the process of calculating the regional grid division, the
angle quality of the grid is controlled between 30 and 120°. The complex shorelines
are locally densified. The average resolution of the shoreline is approximately 0.002°.
The average resolution of the open boundary is approximately 0.01°–0.04° gradient,
and the middle island is partially densified, which contains 41,296 grid nodes and
80,351 triangular elements. The open boundary setting is close to the three gauge
stations of Rizhao, Cheniushan and Yanwei Port, and the time series of the water level
at 259 points in the open boundary section spanning from January 1, 2007 00:00UTC
to February 15, 2007 23:00UTC are given by using the OTIS program M2 , S2 , N2 ,
K 2 , K 1 , O1 , P1 , Q 1 and M4 9 harmonic constant forecasts of tidal forecasts. The
time interval is 1 h, and the mean sea level is used as the datum for the forecast water
6.5 Coastal Tide Correction 481

level. The model adopts a cold start, the temperature is set to 10°C (annual average
surface water temperature), and the salinity is set to 35‰. The mode starts the dry
and wet grid calculation. The low friction coefficient is calculated by the equation
( )
1 2
(C f C f = g/(Mh) 6 , where h is the water depth, M is the Manning coefficient,
and g is the acceleration of gravity. The simulation time of the model is from 00:00
UTC on January 1, 2007, to 23:00 UTC on February 15, 2007, and the results are
output every hour.
In the process of configuring the initial conditions and boundary conditions,
the water depth datum adopts the chart datum. The forced water level of the open
boundary is derived from the Yellow Sea, and its water level reference is the mean sea
level. To avoid changing the accuracy of the forced water level, this section converts
the chart datum-based bathymetry to the mean sea level bathymetry (Figs. 6.40, 6.41
and 6.42).
To verify the results of the model, the simulation results of the tide level were
compared with the actual measured tide stations in the survey area.
The results show that the water level deviations of the three discrete verification
stations in the study area have similar time series changes, indicating that the study
area has inherent time characteristics. Therefore, the residual water level correction
and the remaining water level correction can be used to eliminate part of the water
level deviation.

Fig. 6.40 Bathymetry map before and after depth datum conversion
482 6 Coastal Zone Surveying

Fig. 6.41 Comparison of measured water level and simulated water level at three stations

Fig. 6.42 Comparison of simulation errors of three stations


6.5 Coastal Tide Correction 483

Using the residual water level extraction method and the residual water level
spatial field model construction method described in this section, the residual water
levels of the three stations of Cheniushan, Lianyungang and Lanshangang were
extracted.
At the same time, to further quantify the correlation of the residual water level
of the three stations, we analyzed the correlation of the residual water level of the
three stations of Cheniushan, Lanshan Port and Lianyungang, and the results are as
follows:
Figure 6.43 and Table 6.3 show that the residual water levels of the three stations
of Cheniushan, Lianyungang, and Lanshangang have a strong correlation. According
to this feature, the residual water level value of any sounding point can be obtained
by using Eq. (6.45).
The remaining water levels of the three stations of Cheniushan, Lianyungang and
Lanshan are extracted by using Eq. (6.46), and the results are as follows (Fig. 6.44).
At the same time, to further quantify the correlation of the remaining water levels
of the three stations, we analyzed the correlation of the remaining water levels of the
three stations of Cheniushan, Lanshangang, and Lianyungang, and the results are as
follows (Table 6.4).
The remaining water level correction value for any sounding point can be obtained
by using Eq. (6.47). This correction algorithm can both eliminate most of the
remaining water levels and maintain the spatial characteristics of the remaining water
levels.
Through the extraction and analysis of the residual water level and the remaining
water level, it is shown that the study area conforms to the characteristics of the

Fig. 6.43 Time series distribution of residual water level at three stations

Table 6.3 Correlation


Station Cheniushan Lanshangang Lianyungang
coefficient of residual water
level at the three stations Cheniushan 1 – –
Lanshangang 0.96 1 –
Lianyungang 0.89 0.88 1
484 6 Coastal Zone Surveying

Fig. 6.44 Time series distribution of remaining water levels at three stations

Table 6.4 Remaining water


Station Cheniushan Lanshangang Lianyungang
level correlation coefficients
of the three stations Cheniushan 1.00 – –
Lanshangang 0.78 1.00 –
Lianyungang 0.76 0.76 1.00

dynamic water level correction method. This section uses the dynamic water level
correction method to correct the deviation water level of the three measured stations
and compares the results before and after the correction.
It can be seen from the table that the accuracy of the three stations, Cheniushan,
Lanshangang and Lianyungang, has been improved by 13.2, 7.1 and 10.8 cm after
the water level correction (Fig. 6.45 and Table 6.5).

2. Comparison of dynamic water level correction methods

To verify the effectiveness of the instantaneous water level correction method, a


comparison and verification scheme is designed in this section. The instantaneous
water level correction method is compared with the TCARI and DTZ methods to
verify the performance and characteristics of the dynamic water level correction
method (instantaneous tide correction, ITC) in large-area bathymetry.
Before the comparison, the parameter configuration of the two methods, DTZ
and TCARI, is introduced first. The data used by DTZ and TCARI are consistent
with the ITC method. In the DTZ method, the measured water levels of Cheniushan,
Lanshangang and Yanweigang are used as the water level basis for zoning. The
zoning principle is that the water level difference between adjacent blocks is 6 cm
and the tidal range is 18 min. At the same time, it is assumed that the water level
and tidal range between the stations are uniformly antecedent changes. In TCARI,
the tide level, residual water level and reference deviation are extracted based on the
three stations of Cheniushan, Lanshangang and Yanweigang and then allocated to the
entire survey area. Among them, 37 tidal divisions are used in tide level forecasting.
The shoreline boundary data are consistent with the ITC data, the data extracted by
GEODAS-NG are selected, and the regional grid is constructed by SMS software.
6.5 Coastal Tide Correction 485

Fig. 6.45 Comparison before and after water level correction at three stations

Table 6.5 Corrected and uncorrected root mean square deviations of tide simulations
Time Cheniushan/cm Lanshangang/cm Lianyungang/cm
After correction 11.7 17.7 17.5
Before correction 25.9 24.8 28.3

Then, the tide levels, residual water levels and datum deviations of the three stations
are interpolated to each grid point.
In this section, a comparative scheme is designed based on the simultaneous obser-
vation of water levels at three stations, Cheniushan, Lanshangang, and Lianyungang,
to verify the applicability of the three water level correction methods when the
number of tide gauge stations is reduced. First, we compare the differences of the
486 6 Coastal Zone Surveying

three methods in the effective space domain. Second, we compare the accuracy of
the three algorithms on known points.
In the effectiveness comparison in the spatial domain, all three measured stations
are used. At the same time, the correction results of the ITC method are used as the
ground truth to quantify the differences between TDZ, TCARI and ITC in the form
of correlation and root mean square error.
(1) Correlation function between the ITC method and TDZ and TCARI:
∑ N ( )( )
t=1 Z t' − Z t' Zt − Zt
Cm = / (6.49)
∑ N ( )2 ∑ ( )2
t=1 Z t' − Z t' N
t=1 Zt − Zt

where Z t' represents the water level correction value of the TCARI or DTZ method
at verification point m at time t, Z t represents the water level correction value of the
ITC method at verification point m at time t, m represents the verification point, t
represents the time series of the synchronized water level, and N represents the total
number of water level observations, a total of 716 synchronized water level values
(Fig. 6.46).
(2) Root mean square comparison of DTZ, TCARI, and ITC correction results:
/
∑ N ( )
t=1 Z t' − Z t
RMSDm = (6.50)
N

Fig. 6.46 Location distribution of five verification points


6.5 Coastal Tide Correction 487

In the comparison scheme, the five verification points A, B, C, D and E are set in
this section. Among them, points A, B and C are in the middle of the triangle formed
by the three stations of Cheniushan, Lanshangang, and Lianyungang. D and E are
on the outside of the triangle (Table 6.6).
Next, three water level correction results of ITC, TCARI, and DTZ are processed
using the above two spatial domain comparison schemes at five verification points.
Table 6.7 shows that the correlation coefficients between the two algorithms and
ITC are greater than 0.985. As shown in Table 6.8, the difference between ITC and
the two commonly used algorithms is within 6 cm.
In the comparison of water level correction accuracy, this section designs a veri-
fication method to compare the accuracy of the three algorithms on known points.
Since Cheniushan Island is in the middle of the survey area and is less affected by river
flow, Cheniushan Island is used as the verification point in this section. Based on the
measured water level of Lianyungang and Lanshangang, three water level correction
methods were used to calculate the water level correction value of the Cheniushan
point. Then, it is compared with the measured water level in Cheniushan, and the

Table 6.6 Coordinates of the


Point Longitude Latitude
five verification points
A 119.3918° 34.9410°
B 119.4750° 34.9552°
C 119.5605° 34.9552°
D 119.5791° 35.1755°
E 119.7095° 34.7843°

Table 6.7 Correlation


Point ITC DTZ TCARI
coefficient of water level
correction results of DTZ, A 1 0.9929 0.9855
TCARI, and ITC B 1 0.9966 0.9856
C 1 0.9963 0.9876
D 1 0.9862 0.9900
E 1 0.9761 0.9971

Table 6.8 Root mean square


Point ITC DTZ/cm TCARI/cm
of water level correction
results of DTZ, TCARI, and A 0 3.40 5.70
ITC B 0 2.90 4.18
C 0 3.20 3.02
D 0 4.82 2.70
E 0 4.32 0.60
488 6 Coastal Zone Surveying

accuracy of the three water level correction methods is expressed in the form of the
root mean square.
Three water level correction methods, ITC, TCARI, and DTZ, were verified, and
the root mean square deviation values were 11.74 cm, 20.25 cm and 34.58 cm,
respectively. In the effectiveness comparison in the spatial domain, the maximum
deviation between DTZ and ITC is 4.82 cm, and the maximum deviation between
TCARI and ITC is 5.7 cm. In the comparison of water level correction accuracy,
when the Cheniushan actual measurement station is reduced to participate in the
correction, the accuracies of ITC, TCARI and DTZ are 11.74 cm, 20.25 cm and
34.58 cm, respectively, of which the deviation between TCARI and ITC becomes
8.51 cm, DTZ The deviation from ITC becomes 22.84 cm. This shows that the
accuracy and stability of ITC is significantly better than that of TCARI and DTZ in
the case of fewer tide gauge stations.
It can be seen from Tables 6.7 and 6.8 that in tide correction, the ITC algorithm
is basically consistent with DTZ and TCARI. In the case of insufficient tide gauge
stations [63], the ITC algorithm outperforms the DTZ and TCARI algorithms on
known points. The main reasons for the analysis are as follows:
(1) The selected bathymetric samples are of the same tidal type, and they are all
regular semidiurnal tides [63]. The water level difference and tidal range of the
whole bathymetric sample area have the same temporal and spatial variation
law as the astronomical tide level, and the distance ratio and tidal range can be
applied to the observed total water level change. Therefore, the DTZ method
can obtain highly accurate results.
(2) The second comparison only utilizes data from two tide gauge stations. The
accuracy of both TCARI and DTZ is affected by the number and distribution
of tide gauge stations, with DTZ being the most affected. The ITC algorithm
relies on the accuracy of tide level simulation, while the numerical accuracy
of the tide level is not affected by the number of tide gauge stations. In the
ITC algorithm, reducing the number of tide gauge stations will only affect the
interpolation accuracy of the residual water level and the remaining water level.

6.6 Summary

Coastal zone surveying is a basic infrastructure and strong technical support for
the marine economy and ecological environment. This chapter starts from the cross-
platform, multi-sensor system integration of dynamic marine surveying and mapping
technology and the development of related systems. The main characteristics of the
technology and related typical applications are introduced, such as shipborne coastal
water-shore integrated measurement, airborne LiDAR bathymetry equipment, space-
borne InSAR measurement, coastal tide level correction, etc. With the demand for
References 489

economic and ecological environment construction in the coastal zone, the combi-
nation of the development of integrated land–water comprehensive detection equip-
ment and modern information technologies such as big data and artificial intelli-
gence is an important development trend to reveal the temporal and spatial distri-
bution and changing laws of the geospatial environmental elements of the coastal
zone. The relevant research results provide rich basic data and a reliable scientific
basis for marine resource development, engineering construction, disaster prevention,
ecological environmental protection, and navigation safety.

References

1. Xia D (2009) Geomorphological Environment and Evolution of Coastal Zone. Beijing: China
Ocean Press.
2. Wu Z, Yang F, Li S, et al (2017) High-resolution submarine topography--visual computation
and scientific applications. Beijing: Science Press.
3. Liu Y, Guo K, He X, et al (2017) Research progress of airborne laser bathymetry technology.
Geomatics & Information Science of Wuhan University 42(9):1185–1194.
4. Wang D, Wan B, Qiu P, et al (2018) Evaluating the performance of Sentinel-2, Landsat 8 and
Pléiades-1 in mapping mangrove extent and species. Remote Sens 10(9):1468.
5. Costa B M, Battista T A, Pittman S J (2009) Comparative evaluation of airborne LiDAR
and ship-based multibeam sonar bathymetry and intensity for mapping coral reef ecosystems.
Remote Sens Environ 113(5):1082-1100.
6. Zhao J, Liu J (2008) Multi-beam Sounding and the Process of Image Data. Wuhan: Wuhan
University Press.
7. China Navy Hydrographic Office (2022) Specifications for hydrographic survey (GB 12372-
2022). Beijing: Standard Press of China [2022–10–8].
8. Huang C, Xiuping L, Ouyang Y, et al (2014) Analysis of error source and quality assessment
about multibeam sounding product. Hydrograph Surv Chart 2:1–6.
9. Guenther G (1989) Airborne laser hydrography to chart shallow coastal waters. Sea Technology
30(3):1–4.
10. Zhao J, Zhao X, Zhang H, et al (2017) Improved model for depth bias correction in airborne
LiDAR bathymetry systems. Remote Sens 9(7):710.
11. Guo K, Li Q, Wang C, et al (2022) Development of a single-wavelength airborne bathymetric
LiDAR: System design and data processing. ISPRS J Photogramm Remote Sens 185:62–84.
12. Lillycrop J, Banic J (1992) Advancements in the U.S. army corps of engineers hydrographic
survey capabilities: The shoals system. Mar Geod 15(2–3):177–185.
13. Tuell G, Barbor K, Wozencraft J (2010) Overview of the coastal zone mapping and imaging
LiDAR (CZMIL): A new multisensor airborne mapping system for the US army corps of
engineers//Proceedings of algorithms and technologies for multispectral, hyperspectral, and
ultraspectral imagery XVI, Oriando FL.
14. Steinvall O,Koppari K,Karlsson U (1994).Airborne Laser Depth Sounding:System Aspects and
Performance//Proceedings of SPIE, 2258:392–412.
15. Han S, Rizos C, Abbot R (1998) Flight testing and data analysis of airborne GPS LADS survey//
Proceedings of the 11th International Technical Meeting of the Satellite Division of the Institute
of Navigation (ION GPS 1998), Nashville, TN.
16. RIEGL (2012) VQ-880-GH data sheet. http://www.riegl.com/nc/products/airborne-scanning/
produktdetail/product/scanner/63/. [2012–2–2].
17. Guenther G, Thomas R (1984) Effects of propagation-induced pulse stretching in airborne laser
hydrography//Proceedings of Ocean Opt VII, Monterey CA.
490 6 Coastal Zone Surveying

18. Doneus M, Doneus N, Briese C, et al (2013) Airborne laser bathymetry-detecting and recording
submerged archaeological sites from the air. J Archaeol Sci 40(4):2136–2151.
19. Guo K, Xu W, Liu Y, et al (2017) Gaussian half-wavelength progressive decomposition method
for waveform processing of airborne laser bathymetry. Remote Sens 10(1):35.
20. Ding K, Li Q, Zhu J, et al (2018) An improved quadrilateral fitting algorithm for the water
column contribution in airborne bathymetric LiDAR waveforms. Sensors-Basel 18(2):552.
21. Wang C, Li Q, Liu Y, et al (2015) A comparison of waveform processing algorithms for
single-wavelength LiDAR bathymetry. ISPRS J Photogramm Remote Sens 101:22–35.
22. Guo K, Li Q, Mao Q, et al (2021) Errors of airborne bathymetry LiDAR detection caused by
ocean waves and dimension-based laser incidence correction. Remote Sens 13(9):1750.
23. Hayata T, Hotta T, Iwakiri M (2015) 3D point cloud cluster analysis based on principal compo-
nent analysis of normal-vectors//Proceedings of 4th global conference on consumer electronics
(GCCE), Osaka.
24. Wang C, Wang X, Xiu W, et al (2020) Characteristics of the seismogenic faults in the 2018
Lombok, Indonesia, earthquake sequence as revealed by inversion of InSAR measurements.
Seismol Res Lett 91(2A):733–744.
25. Seymour M, Cumming I (1994) Maximum likelihood estimation for SAR interferometry/
/Proceedings of IEEE international geoscience and remote sensing symposium (IGARSS),
Pasadena.
26. Costantini M (1998) Anovel phase unwrapping method based on network programming. IEEE
Trans Geosci Remote Sens 36(3):813–821.
27. Hooper A, Bekaert D, Spaans K, et al (2012) Recent advances in SAR interferometry time
series analysis for measuring crustal deformation. Tectonophysics 514:1–13.
28. Berardino P, Fornaro G, Lanari R, et al (2002) A new algorithm for surface deformation moni-
toring based on small baseline differential SAR interferograms. IEEE Trans Geosci Remote
Sens 40(11):2375–2383.
29. Manunta M, De Luca C, Zinno I, et al (2019) The parallel SBAS approach for Sentinel-
1 interferometric wide swath deformation time-series generation: Algorithm description and
products quality assessment. IEEE Trans Geosci Remote Sens 57(9):6259–6281.
30. Crosetto M, Monserrat O, Cuevas-González M, et al (2016) Persistent scatterer interferometry:
A review. ISPRS J Photogramm Remote Sens 115:78–89.
31. Ferretti A, Prati C, Rocca F (2001) Permanent scatterers in SAR interferometry. IEEE Trans
Geosci Remote Sens 39(1):8–20.
32. Hooper A (2008) A multi-temporal InSAR method incorporating both persistent scatterer and
small baseline approaches. Geophys Res Lett 35(16).
33. De Zan F, López-Dekker P (2011) SAR image stacking for the exploitation of long-term
coherent targets. IEEE Geosci Remote Sens Lett 8(3):502–506.
34. Ansari H, De Zan F, Parizzi A (2020) Study of systematic bias in measuring surface deformation
with SAR interferometry. IEEE Trans Geosci Remote Sens 59(2):1285–1301.
35. Xiong S, Wang C, Qin X, et al (2021) Time-series analysis on persistent scatter-interferometric
synthetic aperture radar (PS-InSAR) derived displacements of the Hong Kong-Zhuhai-Macao
bridge (HZMB) from Sentinel-1A observations. Remote Sens 13(4):546.
36. Sadeghi Z, Wright T J, Hooper A J, et al (2021) Benchmarking and inter-comparison of
Sentinel-1 InSAR velocities and time series. Remote Sens Environ 256:112306.
37. Fadhillah M F, Achmad A R, Lee C W (2021) Improved combined scatterers interferometry
with optimized point scatterers (ICOPS) for interferometric synthetic aperture radar (InSAR)
time-series analysis. IEEE Trans Geosci Remote Sens 60:1–14.
38. Xue F, Lv X, Dou F, et al (2020) A review of time-series interferometric SAR techniques: A
tutorial for surface deformation analysis. IEEE Geosci Remote Sens Mag 8(1):22–42.
39. Guarnieri A M, Tebaldini S (2008) On the exploitation of target statistics for SAR interferometry
applications. IEEE Trans Geosci Remote Sens 46(11):3436–3443.
40. Rocca F (2007) Modeling interferogram stacks.IEEE Trans Geosci Remote Sens 45(10):3289–
3299.
References 491

41. Ferretti A, Fumagalli A, Novali F, et al (2011) A new algorithm for processing interferometric
data-stacks: Squeesar. IEEE Trans Geosci Remote Sens 49(9):3460–3470.
42. Jiang M, Guarnieri A M (2020) Distributed scatterer interferometry with the refinement of
spatiotemporal coherence. IEEE Trans Geosci Remote Sens 58(6):3977–3987.
43. Samiei-Esfahany S, Martins J E, Van Leijen F, et al (2016) Phase estimation for distributed
scatterers in InSAR stacks using integer least squares estimation. IEEE Trans Geosci Remote
Sens 54(10):5671–5687.
44. Jiang M, Ding X, Hanssen R F, et al (2014) Fast statistically homogeneous pixel selection
for covariance matrix estimation for multitemporal InSAR. IEEE Trans Geosci Remote Sens
53(3):1213–1224.
45. Wang Y, Zhu X X (2015) Robust estimators for multipass SAR interferometry. IEEE Trans
Geosci Remote Sens 54(2):968–980.
46. Ansari H, De Zan F, Bamler R (2017) Sequential estimator: Toward efficient InSAR time series
analysis.IEEE Trans Geosci Remote Sens 55(10):5637–5652.
47. Ansari H, De Zan F, Bamler R (2018) Efficient phase estimation for interferogram stacks.IEEE
Trans Geosci Remote Sens 56(7):4109–4125.
48. Hooper A, Segall P, Zebker H (2007) Persistent scatterer interferometric synthetic apertureradar
for crustal deformation analysis, with application to volcan alcedo, galapagos. J Geophys Res
Solid Earth 112(B7):1–21.
49. Hooper, Andrew (2008) Amulti-temporal InSAR method incorporating both persistent scatterer
and small baseline approaches. Geophys Res Lett 35(16):96–106.
50. Liu H, Pan G, Ying Y, et al (2014) A water-level correction method based on ocean tide
dynamicmodel. J Mar Sci 32(2):35–39.
51. Zhao Y, Wang Y, Wang R, et al (2019) Water level correction in chart measurement by using
tidal model and residual water level monitoring method. Coastal Engineering 38(4):304–309.
52. Fang G, Zheng W, Chen Z, et al (1986) Analysis and Forecasting of Tides and Currents. Beijing:
China Ocean Press.
53. Vanicek P (1987) Four-dimensional Geodetic Positioning, New York: Spring.
54. Liu C (2001) Space structure and data processing in marine sounding. Acta Geod et Cartogr
Sin 30(2):186–186.
55. Martin P J, Smith S R, Posey P G, et al (2009) Use of the Oregon state university tidal inversion
software (OTIS) to generate improved tidal prediction in the east-asian seas.In:Yellow Sea,
artificial satellites, assimilation.AVANO. Available via DIALOG. https://agris.fao.org/agris-
search/search.do?recordID=AV20120142878. Accessed 2009.
56. Chelton D B, Enfield D B (1986) Ocean signals in tide gauge records. J Geophys Res Solid
Earth 91(B9):9081–9098.
57. Frison T W, Earle M D, Abanel H D, et al (1999) Interstation prediction of ocean water levels
using methods of nonlinear dynamics. J Geophys Res Oceans 104(C6): 13653–13666.
58. Hess K (2003) Water level simulation in bays by spatial interpolation of tidal constituents,
residual water levels, and datums. Cont Shelf Res 23(5):395–414.
59. Zhai G, Bao J, Hou S (2002) Research on configuration method of water level correction based
on abnormal water level characteristics. China National Defense Science and Technology
Report.
60. Hou S, Huang C, Ping L et al (2005) Reckoning rearch based on residual water level of tide.
Hydrographic Surveying and Charting 25(6):29–33.
61. Wang Z, Sang J, Wang J (2002) The application of tide level deduction to the depth sounding.
Hydrographic Surveying and Charting 22(2):3–8.
62. Liu J (2013) Analysis of main characteristics of residual water level and its application in
marine surveying and mapping.Bull Surv Mapp (9):105–107.
63. Hwang J, Van S P, Choi B, et al (2014) The physical processes in the Yellow Sea. Ocean Coast
Manag 102:449–457.
Chapter 7
Outlook

Humans have been consistently transforming the earth and nature, resulting in many
constructions. The increasing population and advanced technology have driven these
constructions to extend from the earth’s surface to underground, underwater, and even
space. With the growing expansion of construction scale and complexity of structures
and environments, unprecedented challenges and difficulties are emerging. Engi-
neering surveying is a fundamental technology that offers support and guarantees
throughout a project’s life cycle, including planning, construction, and maintenance.
To tackle the increasing challenges of various applications and meet the demanding
requirements for efficiency, precision, quality, and reliability, technological innova-
tion is urgently needed. Dynamic and precise engineering surveying has emerged
as a new direction in the surveying and mapping discipline to conquer these high
demands. Compared to traditional engineering surveying, its technological innova-
tion is primarily reflected in surveying objects, content, mode, and equipment. These
aspects are facilitated by advanced high-tech solutions in information disciplines
(i.e., electronic information and computer information) and engineering disciplines
(i.e., civil engineering and transportation engineering).
1. Surveyed object

Modern surveying methods must accommodate diverse demands from various appli-
cations, a wide array of users, and increasingly complex working environments. The
objects addressed by modern surveying are no longer limited to those involved in
traditional engineering surveying; instead, they now encompass buildings and munic-
ipal infrastructures, as well as highways, high-speed railways, across-sea bridges,
water conservancy facilities, and scientific facilities. The surveying environment has
expanded from the earth’s surface to underground, undersea, and space. Extensive
coverage, multiple scales, and a complex environment characterize modern, dynamic,
and precise engineering surveying.

© Science Press 2023, corrected publication 2024 493


Q. Li, Dynamic and Precise Engineering Surveying,
https://doi.org/10.1007/978-981-99-5942-6_7
494 7 Outlook

Engineering surveying inherently serves a supportive role in the planning,


construction, and maintenance of projects. Traditionally, it has been oriented towards
civil engineering, equipment installation, and engineering operations. At present,
economic globalization and regional integration necessitate the development of
various infrastructures, including international airports, high-speed rail networks,
south-to-north water diversion projects, nuclear power stations, and extra-high
voltage networks. As the number of citizens served by these infrastructures grows,
the scale of these projects continues to expand. Constructing such infrastructures
demands larger fields, higher precision, and greater efficiency, which in turn presents
an increasing number of challenges for surveying work. For instance, the construc-
tion of the Shenzhen-Zhongshan link confronts the challenge of integrating bridge-
island-tunnel components and the difficulty of conducting fieldwork amidst frequent
typhoons in the Pearl River Estuary. Installing the massive steel roof of Beijing
Daxing Airport necessitates high-precision surveying that surpasses conventional
standards. Meanwhile, as the exploration of the sea and space becomes increasingly
popular in China, engineering surveying extends to underground, undersea, and even
space environments. Examples include deep-sea natural gas hydrate (flammable ice)
excavation, asteroid mineral exploration, and Mars exploration. These large-scale
scientific missions and projects present numerous challenges and requirements for
engineering surveying.
The diversity of engineering projects inevitably leads to various complex
surveying environments. These environments encompass both external natural condi-
tions and internal structural factors. For instance, engineering surveying often
encounters challenging environments with poor satellite signals when constructing
underground infrastructures such as drainage pipelines or underwater tunnels. In
more extreme cases, the structural conditions may not permit the use of contact
equipment or manual operations. An example of this is the planned Sichuan-Xizang
high-speed railway, which is situated in a frigid plateau area. The route traverses
several seismic zones, with bridge tunnels making up over 80% of the route. The diffi-
culties in engineering surveying that arise from this harsh natural environment can
only be addressed by integrating advanced technologies from multiple disciplines.
2. Surveying content
The boundary between engineering surveying and other disciplines is becoming
increasingly blurred, as its scope extends beyond traditional surveying to include
detection and monitoring. Surveying content now goes beyond merely capturing
geometric elements, trending towards a focus on multiple elements, indicators,
and features. As numerous large-scale projects have been completed and initiated,
comprehensive and ongoing monitoring is essential for ensuring their safe operation
and maintenance. For instance, underground pipelines, reservoirs, and dams must be
monitored to prevent failures that could result in the loss of human lives and property.
Abnormal detection and health monitoring have thus become integral components
of engineering surveying.
Comprehensive and continuous surveying results in a significantly larger work-
load compared to the sampling measurements used in traditional surveying. In China,
7 Outlook 495

routine examination and maintenance are mandatory for a vast number of infrastruc-
tures, including tens of thousands of kilometers of high-speed railways, thousands
of kilometers of subway tracks and tunnels, and millions of kilometers of various
pipelines. Furthermore, while engineering surveying has traditionally focused on
geometric elements like points, lines, and surfaces (which require limited measure-
ments), the need for additional physical parameters and features, such as road surface
cracks, tunnel lining temperature, and track steel plate stiffness, has led to a rapid
increase in the number of measurements needed.

3. Surveying mode
To meet the emerging requirements in engineering construction, the surveying mode
changes from periodic sparse measurements of critical points on a target to contin-
uous, comprehensive measurements of the entire target. This shift has also tran-
sitioned from static and manual methods to automatic and intelligent approaches.
The modern surveying mode is characterized by remote control and telemetry,
non-contact measurements, digitization, automation, and intelligence.
In traditional engineering surveying, control points are set up around a target and
measured along with sampling points on the target. Subsequently, target modeling is
accomplished by interpolating the sampling points, with coordinates calculated based
on their geometric relationship with the control points. For instance, digital terrain
models and three-dimensional object models are based on discrete measurements,
which lack detail and fail to capture dynamic changes. However, these dynamic
changes can impact measurement precision, making high accuracy essential. For
example, bridges experience continuous disturbance under dynamic loads, and dam
deformation can be influenced during flood periods. Due to the large workload and
increased measurement frequency, continuous measurements are difficult to carry out
manually. In such situations, non-contact telemetry and automatic robotic surveys
can be employed to establish new, continuous, and efficient methods.
Traditional measuring and surveying methods, which mainly rely on manual
and discrete measurements, are often insufficient for meeting the high standards
and requirements of modern applications. For example, road cracks and pavement
deflection necessitate examination at a spatial resolution of millimeters. Similarly,
high-speed rail track features such as gauges, triangle pits, smoothness, attached
track plates, and fasteners must be examined with great accuracy, typically along the
entire length of the rail. Traditional engineering surveying struggles to strike a balance
between large coverage and high accuracy. To achieve comprehensive inspection of
large-scale objects, modern engineering surveying incorporates dynamic and intel-
ligent features by utilizing vehicle-mounted, airborne, shipboard, robot, and other
mobile platforms in combination with automatic measurement devices.
4. Surveying equipment

While existing instruments have played a significant role in engineering surveying,


they are often single-function and difficult to upgrade with new emerging technolo-
gies. This does not align with the demanding project requirements and the rapid pace
496 7 Outlook

of technological advancements. New equipment has been developed specifically for


dynamic and precise engineering surveying. The evolution of surveying equipment
has led to many achievements in other disciplines, such as photoelectric inspec-
tion, sensing, positioning and navigation, wireless communication, and computer
intelligence. Examples of modern surveying equipment include three-dimensional
laser scanners, intelligent total station scanners, and remote micro-deformation radar
measurement systems. The performance, cost-effectiveness, and maneuverability of
this equipment have significantly improved compared to traditional instruments.
Despite the availability of various sensors, instruments, and equipment on the
market, the lack of uniform specifications results in poor data shareability, limited
versatility, and low-level intelligence. This makes it challenging to address diverse
and complex application scenarios. Specialized equipment is often customized based
on application needs, but this approach tends to be costly, less versatile, and requires
a high technical threshold, resulting in prolonged development, standardization, and
productization periods. For example, road deflection dynamic surveying equipment is
expensive, with only two companies capable of producing it. To address these issues,
professional equipment is being designed and produced with a focus on modular-
ization, integration, and intelligence. Modularization standardizes, generalizes, and
serializes functional modules, which can be integrated to create basic, derivative, and
combinative models of equipment. By designing different equipment with varying
module combinations, complex, large-scale, and specialized engineering surveying
tasks can be more effectively managed. It is expected that equipment modularization,
integration, and intelligence will improve efficiency and applicability while reducing
development costs and shortening development periods.
The services provided by engineering surveying are increasingly becoming
market-driven and socialized, leading to a more refined and simplified division
of labor, such as the separation between hardware and software development and
between fieldwork and data processing. This shift allows non-professionals to engage
in surveying work as well. On the other hand, through 5G communication and the
internet, it is now easier to organically integrate frontend and backend workers,
operators, and technical experts, as well as hardware and software, in order to
realize anytime, anywhere, and seamless services. With advancements in artifi-
cial intelligence, big data, and sensor technology, dynamic and precise engineering
surveying is expected to become more distributed, networked, and intelligent based
on modularization, automation, and integration.
5. Development trend

As a crucial branch of the surveying and mapping discipline, engineering surveying


offers the largest employment opportunities and industrial scale. It maintains a close
relationship with economic and social development. Dynamic and precise engi-
neering surveying represents the frontier and hot topic within the field, reflecting
the advanced technology involved in surveying and mapping. From both academic
and engineering perspectives, future growth and development can be expected in the
area of dynamic and precise engineering surveying.
7 Outlook 497

From the academic perspective, dynamic and precise engineering surveying


fosters the intersection of surveying and mapping technology with information tech-
nology and engineering technology. On the one hand, it enhances measurement
intelligence and improves measurement efficiency through the deep integration of
sensors, communication, artificial intelligence, and other technologies, giving rise
to a new field within the surveying and mapping discipline. On the other hand,
it serves the entire engineering construction, operation, and maintenance process
by providing comprehensive and accurate information and technical support. This
creates new applications within the surveying and mapping discipline, paving the way
for the development of new products, services, and business models, and profoundly
impacting national economic and social development.
From the perspective of engineering applications, modern engineering surveying
has transcended the narrow domain closely tied to engineering construction and
evolved towards a broader domain termed “general engineering surveying”. Dynamic
and precise engineering surveying extends and expands the applications based on
traditional engineering surveying. For example, measurements now reach from the
surface to a structure’s interior, and the measuring content expands from geometric
changes to the performance status of the surveyed objects. For most surveyed objects,
change often starts from the inside, gradually developing from the interior to the exte-
rior and deteriorating until apparent changes occur. Take pavement collapse as an
example. It may originate from damage to an underground pipeline, which grad-
ually leads to its rupture at the surface. Water leakage can cause soil erosion and
eventually result in pavement collapse. Conventional surveying can only measure
the road surface, making it difficult to identify hidden problems, let alone prevent
them. Therefore, new equipment and methods are needed to investigate the interior
of objects and their physical parameters, apart from surface and geometric param-
eters. Multi-source data acquisition, especially data with location attributes inside
the target, is an important research topic for future dynamic and precise engineering
surveying. Meanwhile, fusing geometric data with non-geometric data to analyze the
implicit changes of surveyed targets and establish the inner correlation between the
measured parameters and object states presents a challenge and an essential direction
for modern engineering surveying.
The development and trend of dynamic and precise engineering surveying can be
summarized through the following four innovations.

1. Innovation in developing emerging technologies

The development of surveying and mapping has relied on the advancement of new
technologies, such as GNSS positioning and LiDAR. Dynamic and precise engi-
neering surveying will be closely tied to the innovation of new technologies, including
large-scale LiDAR surveying, high-precision inertial surveying, high-resolution, and
precision visual surveying. In the era of the information revolution, technologies such
as 5G, big data, and artificial intelligence will support the intelligence of surveying
equipment, forging advantages and a new framework of surveying methods. This, in
turn, promotes the development and upgrade of the surveying and mapping discipline.
498 7 Outlook

2. Innovation in proposing new surveying methods

Specialized surveying requirements vary across different application scenarios, and


no uniform method can address all of them. Traditional engineering surveying
methods struggle to handle application scenarios involving complex environments,
high-precision, and efficient fieldwork. Surveying methods need to keep pace with
the new demands of emerging engineering applications, such as the underwater
connection of large-scale pipes, internal deformation monitoring of rockfill dams,
and rapid measurements of high-speed railway CPIII control networks. New methods
are continuously proposed to overcome the challenges in engineering applications,
ensuring the successful completion of construction projects.
3. Innovation in inventing new equipment

Developing and designing professional equipment for dynamic and precise engi-
neering surveying is challenging. The continuous emergence of compact, low-cost,
and high-performance sensors provides more possibilities for equipment innovation,
leading to the development of professional and customizable equipment that features
excellent mobility, high precision, multi-functionality, and intelligence. The constant
and urgent need for equipment invention will bring considerable opportunities to the
industry.

4. Innovation in extending applications

Ubiquitous surveying and mapping are currently trending, and dynamic and precise
engineering surveying is also moving towards ubiquity. It exceeds the limitations
of traditional surveying and mapping, pushing the service into a broader field. New
applications are emerging in nearly all human activities. From a macro perspective, it
can be used to construct and maintain infrastructure and explore terrain, sea, the earth,
and space. From a medium perspective, it can be applied in autonomous driving,
intelligent manufacturing, public safety, and more. From a micro perspective, it can
be used for product quality control, intelligent medical robots, diagnosis of equipment
operation, etc.
Correction to: Dynamic and Precise
Engineering Surveying

Correction to:
Q. Li, Dynamic and Precise Engineering Surveying,
https://doi.org/10.1007/978-981-99-5942-6

The original version of the book was updated with the belated corrections in Chap-
ters 1, 2, 3, 4, 5, 6, 7, foreword, preface II, contents. The correction chapters and the
book have been updated.

The updated version of the book can be found at


https://doi.org/10.1007/978-981-99-5942-6

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 C1
Q. Li, Dynamic and Precise Engineering Surveying,
https://doi.org/10.1007/978-981-99-5942-6_8

You might also like