0% found this document useful (0 votes)
27 views69 pages

Graph Neural Networks: Foundations, Frontiers, and Applications First Edition Lingfei Wu Instant Download

The document is an introduction to the book 'Graph Neural Networks: Foundations, Frontiers, and Applications' edited by Lingfei Wu and others, which provides a comprehensive overview of graph neural networks (GNNs). It covers foundational concepts, methodologies, and applications of GNNs, aimed at students, researchers, and practitioners in the field. The book addresses the rapid growth of GNNs and the challenges faced in their application across various domains.

Uploaded by

akpenalwen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views69 pages

Graph Neural Networks: Foundations, Frontiers, and Applications First Edition Lingfei Wu Instant Download

The document is an introduction to the book 'Graph Neural Networks: Foundations, Frontiers, and Applications' edited by Lingfei Wu and others, which provides a comprehensive overview of graph neural networks (GNNs). It covers foundational concepts, methodologies, and applications of GNNs, aimed at students, researchers, and practitioners in the field. The book addresses the rapid growth of GNNs and the challenges faced in their application across various domains.

Uploaded by

akpenalwen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Graph Neural Networks: Foundations, Frontiers,

and Applications First Edition Lingfei Wu


install download

https://ebookmeta.com/product/graph-neural-networks-foundations-
frontiers-and-applications-first-edition-lingfei-wu/

Download more ebook from https://ebookmeta.com


Lingfei Wu · Peng Cui
Jian Pei · Liang Zhao Eds.

Graph Neural
Networks

Foundations,
Frontiers,
and Applications
Graph Neural Networks: Foundations,
Frontiers, and Applications
Lingfei Wu • Peng Cui • Jian Pei • Liang Zhao
Editors

Graph Neural Networks:


Foundations, Frontiers,
and Applications

123
Editors
Lingfei Wu Peng Cui
JD Silicon Valley Research Center Tsinghua University
Mountain View, CA, USA Beijing, China

Jian Pei Liang Zhao


Simon Fraser University Emory University
Burnaby, Canada Atlanta, USA

ISBN 978-981-16-6053-5 ISBN 978-981-16-6054-2 (eBook)


https://doi.org/10.1007/978-981-16-6054-2

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore
Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Foreword

“The first comprehensive book covering the full spectrum of a young, fast-growing
research field, graph neural networks (GNNs), written by authoritative authors!”
Jiawei Han (Michael Aiken Chair Professor at University of Illinois at Urbana-
Champaign, ACM Fellow and IEEE Fellow)
“This book presents a comprehensive and timely survey on graph representation
learning. Edited and contributed by the best group of experts in this area, this book
is a must-read for students, researchers and pratictioners who want to learn anything
about Graph Neural Networks.”
Heung-Yeung ”Harry” Shum (Former Executive Vice President for Technology
and Research at Microsoft Research, ACM Fellow, IEEE Fellow, FREng)
“As the new frontier of deep learning, Graph Neural Networks offer great potential
to combine probabilistic learning and symbolic reasoning, and bridge knowledge-
driven and data-driven paradigms, nurturing the development of third-generation
AI. This book provides a comprehensive and insightful introduction to GNN, rang-
ing from foundations to frontiers, from algorithms to applications. It is a valuable
resource for any scientist, engineer and student who wants to get into this exciting
field.”
Bo Zhang (Member of Chinese Academy of Science, Professor at Tsinghua Uni-
versity)
“Graph Neural Networks are one of the hottest areas of machine learning and this
book is a wonderful in-depth resource covering a broad range of topics and applica-
tions of graph representation learning.”
Jure Leskovec (Associate Professor at Stanford University, and investigator at
Chan Zuckerberg Biohub).
“Graph Neural Networks are an emerging machine learning model that is already
taking the scientific and industrial world by storm. The time is perfect to get in on the
action – and this book is a great resource for newcomers and seasoned practitioners

v
vi Foreword

alike! Its chapters are very carefully written by many of the thought leaders at the
forefront of the area.”
Petar Veličković (Senior Research Scientist, DeepMind)
Preface

The field of graph neural networks (GNNs) has seen rapid and incredible strides over
the recent years. Graph neural networks, also known as deep learning on graphs,
graph representation learning, or geometric deep learning, have become one of the
fastest-growing research topics in machine learning, especially deep learning. This
wave of research at the intersection of graph theory and deep learning has also influ-
enced other fields of science, including recommendation systems, computer vision,
natural language processing, inductive logic programming, program synthesis, soft-
ware mining, automated planning, cybersecurity, and intelligent transportation.
Although graph neural networks have achieved remarkable attention, it still faces
many challenges when applying them into other domains, from the theoretical un-
derstanding of methods to the scalability and interpretability in a real system, and
from the soundness of the methodology to the empirical performance in an applica-
tion. However, as the field rapidly grows, it has been extremely challenging to gain
a global perspective of the developments of GNNs. Therefore, we feel the urgency
to bridge the above gap and have a comprehensive book on this fast-growing yet
challenging topic, which can benefit a broad audience including advanced under-
graduate and graduate students, postdoctoral researchers, lecturers, and industrial
practitioners.
This book is intended to cover a broad range of topics in graph neural networks,
from the foundations to the frontiers, and from the methodologies to the applica-
tions. Our book is dedicated to introducing the fundamental concepts and algorithms
of GNNs, new research frontiers of GNNs, and broad and emerging applications
with GNNs.

Book Website and Resources

The website and further resources of this book can be found at: https://
graph-neural-networks.github.io/. The website provides online preprints
and lecture slides of all the chapters. It also provides pointers to useful material and
resources that are publicly available and relevant to graph neural networks.

vii
viii Preface

To the Instructors

The book can be used for a one-semester graduate course for graduate students.
Though it is mainly written for students with a background in computer science,
students with a basic understanding of probability, statistics, graph theory, linear
algebra, and machine learning techniques such as deep learning will find it easily
accessible. Some chapters can be skipped or assigned as homework assignments for
reviewing purposes if students have knowledge of a chapter. For example, if students
have taken a deep learning course, they can skip Chapter 1. The instructors can also
choose to combine Chapters 1, 2, and 3 together as a background introduction course
at the very beginning.
When the course focuses more on the foundation and theories of graph neural net-
works, the instructor can choose to focus more on Chapters 4-8 while using Chapters
19-27 to showcase the applications, motivations, and limitations. Please refer to the
Editors’ Notes at the end of each chapter on how Chapters 4-8 and Chapters 19-27
are correlated. When the course focuses more on the research frontiers, Chapters
9-18 can be the pivot to organize the course. For example, an instructor can make
it an advanced graduate course where the students are asked to search and present
the most recent research papers in each different research frontier. They can also
be asked to establish their course projects based on the applications described in
Chapters 19-27 as well as the materials provided on our website.

To the Readers

This book was designed to cover a wide range of topics in the field of graph neu-
ral network field, including background, theoretical foundations, methodologies, re-
search frontiers, and applications. Therefore, it can be treated as a comprehensive
handbook for a wide variety of readers such as students, researchers, and profession-
als. You should have some knowledge of the concepts and terminology associated
with statistics, machine learning, and graph theory. Some backgrounds of the basics
have been provided and referenced in the first eight chapters. You should better also
have knowledge of deep learning and some programming experience for easily ac-
cessing the most of chapters of this book. In particular, you should be able to read
pseudocode and understand graph structures.
The book is well modularized and each chapter can be learned in a standalone
manner based on the individual interests and needs. For those readers who want
to have a solid understanding of various techniques and theories of graph neural
networks, you can start from Chapters 4-9. For those who further want to perform
in-depth research and advance related fields, please read those chapters of interest
among Chapters 9-18, which provide comprehensive knowledge in the most recent
research issues, open problems, and research frontiers. For those who want to ap-
ply graph neural networks to benefit specific domains, or aim at finding interesting
applications to validate specific graph neural networks techniques, please refer to
Chapters 19-27.
Acknowledgements

Graph machine learning has attracted many gifted researchers to make their seminal
contributions over the last few years. We are very fortunate to discuss the chal-
lenges and opportunities, and often work with many of them on a rich variety of
research topics in this exciting field. We are deeply indebted to these collaborators
and colleagues from JD.COM, IBM Research, Tsinghua University, Simon Fraser
University, Emory University, and elsewhere, who encouraged us to create such a
book comprehensively covering various topics of Graph Neural Networks in order
to educate the interested beginners and foster the advancement of the field for both
academic researchers and industrial practitioners.
This book would not have been possible without the contributions of many peo-
ple. We would like to give many thanks to the people who offered feedback on
checking the consistency of the math notations of the entire book as well as ref-
erence editing of this book. They are people from Emory University: Ling Chen,
Xiaojie Guo, and Shiyu Wang, as well as people from Tsinghua University: Yue He,
Ziwei Zhang, and Haoxin Liu. We would like to give our special thanks to Dr. Xiao-
jie Guo, who generously offered her help in providing numerous valuable feedback
on many chapters.
We also want to thank those who allowed us to reproduce images, figures, or data
from their publications.
Finally, we would like to thank our families for their love, patience and support
during this very unusual time when we are writing and editing this book.

ix
Editor Biography

Dr. Lingfei Wu is a Principal Scientist at JD.COM


Silicon Valley Research Center, leading a team of
30+ machine learning/natural language processing
scientists and software engineers to build intelligent
e-commerce personalization systems. He earned his
Ph.D. degree in computer science from the College
of William and Mary in 2016. Previously, he was a
research staff member at IBM Thomas J. Watson Re-
search Center and led a 10+ research scientist team
for developing novel Graph Neural Networks meth-
ods and systems, which leads to the #1 AI Chal-
lenge Project in IBM Research and multiple IBM
Awards including three-time Outstanding Technical
Achievement Award. He has published more than 90 top-ranked conference and
journal papers, and is a co-inventor of more than 40 filed US patents. Because of
the high commercial value of his patents, he has received eight invention achieve-
ment awards and has been appointed as IBM Master Inventors, class of 2020. He
was the recipients of the Best Paper Award and Best Student Paper Award of sev-
eral conferences such as IEEE ICC’19, AAAI workshop on DLGMA’20 and KDD
workshop on DLG’19. His research has been featured in numerous media out-
lets, including NatureNews, YahooNews, Venturebeat, TechTalks, SyncedReview,
Leiphone, QbitAI, MIT News, IBM Research News, and SIAM News. He has
co-organized 10+ conferences (KDD, AAAI, IEEE BigData) and is the founding
co-chair for Workshops of Deep Learning on Graphs (with AAAI’21, AAAI’20,
KDD’21, KDD’20, KDD’19, and IEEE BigData’19). He has currently served as
Associate Editor for IEEE Transactions on Neural Networks and Learning Systems,
ACM Transactions on Knowledge Discovery from Data and International Journal
of Intelligent Systems, and regularly served as a SPC/PC member of the following
major AI/ML/NLP conferences including KDD, IJCAI, AAAI, NIPS, ICML, ICLR,
and ACL.

xi
xii Editor Biography

Dr. Peng Cui is an Associate Professor with


tenure at Department of Computer Science in Ts-
inghua University. He obtained his PhD degree from
Tsinghua University in 2010. His research interests
include data mining, machine learning and multime-
dia analysis, with expertise on network representa-
tion learning, causal inference and stable learning,
social dynamics modeling, and user behavior model-
ing, etc. He is keen to promote the convergence and
integration of causal inference and machine learn-
ing, addressing the fundamental issues of today’s
AI technology, including explainability, stability and
fairness issues. He is recognized as a Distinguished Scientist of ACM, Distinguished
Member of CCF and Senior Member of IEEE. He has published more than 100 pa-
pers in prestigious conferences and journals in machine learning and data mining.
He is one of the most cited authors in network embedding. A number of his pro-
posed algorithms on network embedding generate substantial impact in academia
and industry. His recent research won the IEEE Multimedia Best Department Paper
Award, IEEE ICDM 2015 Best Student Paper Award, IEEE ICME 2014 Best Pa-
per Award, ACM MM12 Grand Challenge Multimodal Award, MMM13 Best Paper
Award, and were selected into the Best of KDD special issues in 2014 and 2016,
respectively. He was PC co-chair of CIKM2019 and MMM2020, SPC or area chair
of ICML, KDD, WWW, IJCAI, AAAI, etc., and Associate Editors of IEEE TKDE
(2017-), IEEE TBD (2019-), ACM TIST(2018-), and ACM TOMM (2016-) etc. He
received ACM China Rising Star Award in 2015, and CCF-IEEE CS Young Scien-
tist Award in 2018.
Editor Biography xiii

Dr. Jian Pei is a Professor in the School of


Computing Science at Simon Fraser University. He
is a well-known leading researcher in the general
areas of data science, big data, data mining, and
database systems. His expertise is on developing
effective and efficient data analysis techniques for
novel data intensive applications, and transferring
his research results to products and business practice.
He is recognized as a Fellow of the Royal Society of
Canada (Canada’s national academy), the Canadian
Academy of Engineering, the Association of Com-
puting Machinery (ACM) and the Institute of Elec-
trical and Electronics Engineers (IEEE). He is one of the most cited authors in data
mining, database systems, and information retrieval. Since 2000, he has published
one textbook, two monographs and over 300 research papers in refereed journals
and conferences, which have been cited extensively by others. His research has
generated remarkable impact substantially beyond academia. For example, his al-
gorithms have been adopted by industry in production and popular open-source
software suites. Jian Pei also demonstrated outstanding professional leadership in
many academic organizations and activities. He was the editor-in-chief of the IEEE
Transactions of Knowledge and Data Engineering (TKDE) in 2013-16, the chair of
the Special Interest Group on Knowledge Discovery in Data (SIGKDD) of the As-
sociation for Computing Machinery (ACM) in 2017-2021, and a general co-chair
or program committee co-chair of many premier conferences. He maintains a wide
spectrum of industry relations with both global and local industry partners. He is
an active consultant and coach for industry on enterprise data strategies, healthcare
informatics, network security intelligence, computational finance, and smart retail.
He received many prestigious awards, including the 2017 ACM SIGKDD Innova-
tion Award, the 2015 ACM SIGKDD Service Award, the 2014 IEEE ICDM Re-
search Contributions Award, the British Columbia Innovation Council 2005 Young
Innovator Award, an NSERC 2008 Discovery Accelerator Supplements Award (100
awards cross the whole country), an IBM Faculty Award (2006), a KDD Best Ap-
plication Paper Award (2008), an ICDE Influential Paper Award (2018), a PAKDD
Best Paper Award (2014), a PAKDD Most Influential Paper Award (2009), and an
IEEE Outstanding Paper Award (2007).
xiv Editor Biography

Dr. Liang Zhao is an assistant professor at the


Department of Compute Science at Emory Univer-
sity. Before that, he was an assistant professor in
the Department of Information Science and Tech-
nology and the Department of Computer Science at
George Mason University. He obtained his PhD de-
gree in 2016 from Computer Science Department
at Virginia Tech in the United States. His research
interests include data mining, artificial intelligence,
and machine learning, with special interests in spa-
tiotemporal and network data mining, deep learn-
ing on graphs, nonconvex optimization, model paral-
lelism, event prediction, and interpretable machine learning. He received AWS Ma-
chine Learning Research Award in 2020 from Amazon Company for his research on
distributed graph neural networks. He won NSF Career Award in 2020 awarded by
National Science Foundation for his research on deep learning for spatial networks,
and Jeffress Trust Award in 2019 for his research on deep generative models for bio-
molecules, awarded by Jeffress Memorial Trust Foundation and Bank of America.
He won the Best Paper Award in the 19th IEEE International Conference on Data
Mining (ICDM 2019) for the paper of his lab on deep graph transformation. He has
also won Best Paper Award Shortlist in the 27th Web Conference (WWW 2021) for
deep generative models. He was selected as “Top 20 Rising Star in Data Mining”
by Microsoft Search in 2016 for his research on spatiotemporal data mining. He has
also won Outstanding Doctoral Student in the Department of Computer Science at
Virginia Tech in 2017. He is awarded as CIFellow Mentor 2021 by the Computing
Community Consortium for his research on deep learning for spatial data. He has
published numerous research papers in top-tier conferences and journals such as
KDD, TKDE, ICDM, ICLR, Proceedings of the IEEE, ACM Computing Surveys,
TKDD, IJCAI, AAAI, and WWW. He has been serving as organizers such as pub-
lication chair, poster chair, and session chair for many top-tier conferences such as
SIGSPATIAL, KDD, ICDM, and CIKM.
List of Contributors

Miltiadis Allamanis
Microsoft Research, Cambridge, UK
Yu Chen
Facebook AI, Menlo Park, CA, USA
Yunfei Chu
Alibaba Group, Hangzhou, China
Peng Cui
Tsinghua University, Beijing, China
Tyler Derr
Vanderbilt University, Nashville, TN, USA
Keyu Duan
Texas A&M University, College Station, TX, USA
Qizhang Feng
Texas A&M University, College Station, TX, USA
Stephan Günnemann
Technical University of Munich, München, Germany
Xiaojie Guo
JD.COM Silicon Valley Research Center, Mountain View, CA, USA
Yu Hou
Weill Cornell Medicine, New York City, New York, USA
Xia Hu
Texas A&M University, College Station, TX, USA
Junzhou Huang
University of Texas at Arlington, Arlington, TA, United States
Shouling Ji

xv
xvi List of Contributors

Zhejiang University, Hangzhou, China


Wei Jin
Michigan State University, East Lansing, MI, USA
Anowarul Kabir
George Mason University, Fairfax, VA, USA
Seyed Mehran Kazemi
Borealis AI, Montreal, Canada.
Jure Leskovec
Stanford University, Stanford, CA, USA
Juncheng Li
Zhejiang University, Hangzhou, China
Jiacheng Li
Zhejiang University, Hangzhou, China
Pan Li
Purdue University, Lafayette, IN, USA
Yanhua Li
Worcester Polytechnic Institute, Worcester, MA, USA
Renjie Liao
University of Toronto, Toronto, Canada
Xiang Ling
Zhejiang University, Hangzhou, China
Bang Liu
University of Montreal, Montreal, Canada
Ninghao Liu
Texas A&M University, College Station, TX, USA
Zirui Liu
Texas A&M University, College Station, TX, USA
Hehuan Ma
University of Texas at Arlington, College Station, TX, USA
Collin McMillan
University of Notre Dame, Notre Dame, IN, USA
Christopher Morris
Polytechnique Montréal, Montréal, Canada
Zongshen Mu
Zhejiang University, Hangzhou, China
Menghai Pan
List of Contributors xvii

Worcester Polytechnic Institute, Worcester, MA, USA


Jian Pei
Simon Fraser University, British Columbia, Canada
Yu Rong
Tencent AI Lab, Shenzhen, China
Amarda Shehu
George Mason University, Fairfax, VA, USA
Kai Shen
Zhejiang University, Hangzhou, China
Chuan Shi
Beijing University of Posts and Telecommunications, Beijing, China
Le Song
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab
Emirates
Chang Su
Weill Cornell Medicine, New York City, New York. USA
Jian Tang
Mila-Quebec AI Institute, HEC Montreal, Canada
Siliang Tang
Zhejiang University, Hangzhou, China
Fei Wang
Weill Cornell Medicine, New York City, New York, USA
Shen Wang
University of Illinois at Chicago, Chicago, IL, USA
Shiyu Wang
Emory University, Atlanta, GA, USA
Xiao Wang
Beijing University of Posts and Telecommunications, Beijing, China
Yu Wang
Vanderbilt University, Nashville, TN, USA
Chunming Wu
Zhejiang University, Hangzhou, China
Lingfei Wu
JD.COM Silicon Valley Research Center, Mountain View, CA, USA
Hongxia Yang
Alibaba Group, Hangzhou, China
Jiangchao Yao
xviii List of Contributors

Alibaba Group, Hangzhou, China


Philip S. Yu
University of Illinois at Chicago, Chicago, IL, USA
Muhan Zhang
Peking University, Beijing, China
Wenqiao Zhang
Zhejiang University, Hangzhou, China
Liang Zhao
Emory University, Atlanta, GA, USA
Chang Zhou
Alibaba Group, Hangzhou, China
Kaixiong Zhou
Texas A&M University, TX, USA
Xun Zhou
University of Iowa, Iowa City, IA, USA
Contents

Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
1 Basic concepts of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
2 Machine Learning on Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii
3 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii

Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv

Part I Introduction

1 Representation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Liang Zhao, Lingfei Wu, Peng Cui and Jian Pei
1.1 Representation Learning: An Introduction . . . . . . . . . . . . . . . . . . . . . 3
1.2 Representation Learning in Different Areas . . . . . . . . . . . . . . . . . . . 5
1.2.1 Representation Learning for Image Processing . . . . . . . . . 5
1.2.2 Representation Learning for Speech Recognition . . . . . . . 8
1.2.3 Representation Learning for Natural Language Processing 10
1.2.4 Representation Learning for Networks . . . . . . . . . . . . . . . . 13
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Graph Representation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Peng Cui, Lingfei Wu, Jian Pei, Liang Zhao and Xiao Wang
2.1 Graph Representation Learning: An Introduction . . . . . . . . . . . . . . . 17
2.2 Traditional Graph Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Modern Graph Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.1 Structure-Property Preserving Graph Representation
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.2 Graph Representation Learning with Side Information . . . 23
2.3.3 Advanced Information Preserving Graph
Representation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

xix
xx Contents

3 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27


Lingfei Wu, Peng Cui, Jian Pei, Liang Zhao and Le Song
3.1 Graph Neural Networks: An Introduction . . . . . . . . . . . . . . . . . . . . . 28
3.2 Graph Neural Networks: Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Graph Neural Networks: Foundations . . . . . . . . . . . . . . . . . 29
3.2.2 Graph Neural Networks: Frontiers . . . . . . . . . . . . . . . . . . . 31
3.2.3 Graph Neural Networks: Applications . . . . . . . . . . . . . . . . 33
3.2.4 Graph Neural Networks: Organization . . . . . . . . . . . . . . . . 35
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Part II Foundations of Graph Neural Networks

4 Graph Neural Networks for Node Classification . . . . . . . . . . . . . . . . . . 41


Jian Tang and Renjie Liao
4.1 Background and Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Supervised Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.1 General Framework of Graph Neural Networks . . . . . . . . 43
4.2.2 Graph Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . 44
4.2.3 Graph Attention Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.4 Neural Message Passing Networks . . . . . . . . . . . . . . . . . . . 48
4.2.5 Continuous Graph Neural Networks . . . . . . . . . . . . . . . . . . 48
4.2.6 Multi-Scale Spectral Graph Convolutional Networks . . . . 51
4.3 Unsupervised Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1 Variational Graph Auto-Encoders . . . . . . . . . . . . . . . . . . . . 54
4.3.2 Deep Graph Infomax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Over-smoothing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5 The Expressive Power of Graph Neural Networks . . . . . . . . . . . . . . . . 63
Pan Li and Jure Leskovec
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Graph Representation Learning and Problem Formulation . . . . . . . 67
5.3 The Power of Message Passing Graph Neural Networks . . . . . . . . . 70
5.3.1 Preliminaries: Neural Networks for Sets . . . . . . . . . . . . . . . 70
5.3.2 Message Passing Graph Neural Networks . . . . . . . . . . . . . 71
5.3.3 The Expressive Power of MP-GNN . . . . . . . . . . . . . . . . . . 72
5.3.4 MP-GNN with the Power of the 1-WL Test . . . . . . . . . . . . 75
5.4 Graph Neural Networks Architectures that are more Powerful
than 1-WL Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4.1 Limitations of MP-GNN . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4.2 Injecting Random Attributes . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.3 Injecting Deterministic Distance Attributes . . . . . . . . . . . . 86
5.4.4 Higher-order Graph Neural Networks . . . . . . . . . . . . . . . . . 92
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Contents xxi

6 Graph Neural Networks: Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99


Hehuan Ma, Yu Rong, and Junzhou Huang
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3 Sampling Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3.1 Node-wise Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3.2 Layer-wise Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3.3 Graph-wise Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.4 Applications of Large-scale Graph Neural Networks on
Recommendation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.4.1 Item-item Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.4.2 User-item Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.5 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7 Interpretability in Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . 121
Ninghao Liu and Qizhang Feng and Xia Hu
7.1 Background: Interpretability in Deep Models . . . . . . . . . . . . . . . . . . 121
7.1.1 Definition of Interpretability and Interpretation . . . . . . . . . 122
7.1.2 The Value of Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.1.3 Traditional Interpretation Methods . . . . . . . . . . . . . . . . . . . 124
7.1.4 Opportunities and Challenges . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 Explanation Methods for Graph Neural Networks . . . . . . . . . . . . . . 128
7.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.2.2 Approximation-Based Explanation . . . . . . . . . . . . . . . . . . . 130
7.2.3 Relevance-Propagation Based Explanation . . . . . . . . . . . . 134
7.2.4 Perturbation-Based Approaches . . . . . . . . . . . . . . . . . . . . . . 135
7.2.5 Generative Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.3 Interpretable Modeling on Graph Neural Networks . . . . . . . . . . . . . 138
7.3.1 GNN-Based Attention Models . . . . . . . . . . . . . . . . . . . . . . . 138
7.3.2 Disentangled Representation Learning on Graphs . . . . . . . 141
7.4 Evaluation of Graph Neural Networks Explanations . . . . . . . . . . . . 143
7.4.1 Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.5 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8 Graph Neural Networks: Adversarial Robustness . . . . . . . . . . . . . . . . 149
Stephan Günnemann
8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8.2 Limitations of Graph Neural Networks: Adversarial Examples . . . 152
8.2.1 Categorization of Adversarial Attacks . . . . . . . . . . . . . . . . 152
8.2.2 The Effect of Perturbations and Some Insights . . . . . . . . . 156
8.2.3 Discussion and Future Directions . . . . . . . . . . . . . . . . . . . . 159
8.3 Provable Robustness: Certificates for Graph Neural Networks . . . . 160
8.3.1 Model-Specific Certificates . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.3.2 Model-Agnostic Certificates . . . . . . . . . . . . . . . . . . . . . . . . 163
8.3.3 Advanced Certification and Discussion . . . . . . . . . . . . . . . 165
xxii Contents

8.4 Improving Robustness of Graph Neural Networks . . . . . . . . . . . . . . 165


8.4.1 Improving the Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.4.2 Improving the Training Procedure . . . . . . . . . . . . . . . . . . . . 167
8.4.3 Improving the Graph Neural Networks’ Architecture . . . . 170
8.4.4 Discussion and Future Directions . . . . . . . . . . . . . . . . . . . . 171
8.5 Proper Evaluation in the View of Robustness . . . . . . . . . . . . . . . . . . 172
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Part III Frontiers of Graph Neural Networks

9 Graph Neural Networks: Graph Classification . . . . . . . . . . . . . . . . . . . 179


Christopher Morris
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.2 Graph neural networks for graph classification: Classic works
and modern architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.2.1 Spatial approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2.2 Spectral approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.3 Pooling layers: Learning graph-level outputs from node-level
outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.3.1 Attention-based pooling layers . . . . . . . . . . . . . . . . . . . . . . 187
9.3.2 Cluster-based pooling layers . . . . . . . . . . . . . . . . . . . . . . . . 187
9.3.3 Other pooling layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.4 Limitations of graph neural networks and higher-order layers for
graph classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.4.1 Overcoming limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.5 Applications of graph neural networks for graph classification . . . . 191
9.6 Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
10 Graph Neural Networks: Link Prediction . . . . . . . . . . . . . . . . . . . . . . . 195
Muhan Zhang
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.2 Traditional Link Prediction Methods . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.2.1 Heuristic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.2.2 Latent-Feature Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.2.3 Content-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.3 GNN Methods for Link Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.3.1 Node-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.3.2 Subgraph-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
10.3.3 Comparing Node-Based Methods and Subgraph-Based
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
10.4 Theory for Link Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
10.4.1 γ-Decaying Heuristic Theory . . . . . . . . . . . . . . . . . . . . . . . . 211
10.4.2 Labeling Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
10.5 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
10.5.1 Accelerating Subgraph-Based Methods . . . . . . . . . . . . . . . 220
Contents xxiii

10.5.2 Designing More Powerful Labeling Tricks . . . . . . . . . . . . . 221


10.5.3 Understanding When to Use One-Hot Features . . . . . . . . . 222
11 Graph Neural Networks: Graph Generation . . . . . . . . . . . . . . . . . . . . . 225
Renjie Liao
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
11.2 Classic Graph Generative Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
11.2.1 Erdős–Rényi Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
11.2.2 Stochastic Block Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
11.3 Deep Graph Generative Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
11.3.1 Representing Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
11.3.2 Variational Auto-Encoder Methods . . . . . . . . . . . . . . . . . . . 230
11.3.3 Deep Autoregressive Methods . . . . . . . . . . . . . . . . . . . . . . . 236
11.3.4 Generative Adversarial Methods . . . . . . . . . . . . . . . . . . . . . 244
11.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
12 Graph Neural Networks: Graph Transformation . . . . . . . . . . . . . . . . . 251
Xiaojie Guo, Shiyu Wang, Liang Zhao
12.1 Problem Formulation of Graph Transformation . . . . . . . . . . . . . . . . 252
12.2 Node-level Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
12.2.1 Definition of Node-level Transformation . . . . . . . . . . . . . . 253
12.2.2 Interaction Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
12.2.3 Spatio-Temporal Convolution Recurrent Neural Networks254
12.3 Edge-level Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
12.3.1 Definition of Edge-level Transformation . . . . . . . . . . . . . . 256
12.3.2 Graph Transformation Generative Adversarial Networks . 257
12.3.3 Multi-scale Graph Transformation Networks . . . . . . . . . . . 259
12.3.4 Graph Transformation Policy Networks . . . . . . . . . . . . . . . 260
12.4 Node-Edge Co-Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
12.4.1 Definition of Node-Edge Co-Transformation . . . . . . . . . . . 261
12.4.2 Editing-based Node-Edge Co-Transformation . . . . . . . . . . 266
12.5 Other Graph-based Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.5.1 Sequence-to-Graph Transformation . . . . . . . . . . . . . . . . . . 271
12.5.2 Graph-to-Sequence Transformation . . . . . . . . . . . . . . . . . . 272
12.5.3 Context-to-Graph Transformation . . . . . . . . . . . . . . . . . . . . 273
12.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
13 Graph Neural Networks: Graph Matching . . . . . . . . . . . . . . . . . . . . . . 277
Xiang Ling, Lingfei Wu, Chunming Wu and Shouling Ji
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
13.2 Graph Matching Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
13.2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
13.2.2 Deep Learning based Models . . . . . . . . . . . . . . . . . . . . . . . . 282
13.2.3 Graph Neural Network based Models . . . . . . . . . . . . . . . . . 284
13.3 Graph Similarity Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
13.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
xxiv Contents

13.3.2 Graph-Graph Regression Tasks . . . . . . . . . . . . . . . . . . . . . . 290


13.3.3 Graph-Graph Classification Tasks . . . . . . . . . . . . . . . . . . . . 293
13.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

14 Graph Neural Networks: Graph Structure Learning . . . . . . . . . . . . . . 297


Yu Chen and Lingfei Wu
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
14.2 Traditional Graph Structure Learning . . . . . . . . . . . . . . . . . . . . . . . . . 299
14.2.1 Unsupervised Graph Structure Learning . . . . . . . . . . . . . . . 299
14.2.2 Supervised Graph Structure Learning . . . . . . . . . . . . . . . . . 301
14.3 Graph Structure Learning for Graph Neural Networks . . . . . . . . . . . 303
14.3.1 Joint Graph Structure and Representation Learning . . . . . 304
14.3.2 Connections to Other Problems . . . . . . . . . . . . . . . . . . . . . . 317
14.4 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
14.4.1 Robust Graph Structure Learning . . . . . . . . . . . . . . . . . . . . 319
14.4.2 Scalable Graph Structure Learning . . . . . . . . . . . . . . . . . . . 320
14.4.3 Graph Structure Learning for Heterogeneous Graphs . . . . 320
14.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
15 Dynamic Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Seyed Mehran Kazemi
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
15.2 Background and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
15.2.1 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
15.2.2 Sequence Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
15.2.3 Encoder-Decoder Framework and Model Training . . . . . . 330
15.3 Categories of Dynamic Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
15.3.1 Discrete vs. Continues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
15.3.2 Types of Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
15.3.3 Prediction Problems, Interpolation, and Extrapolation . . . 334
15.4 Modeling Dynamic Graphs with Graph Neural Networks . . . . . . . . 335
15.4.1 Conversion to Static Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 335
15.4.2 Graph Neural Networks for DTDGs . . . . . . . . . . . . . . . . . . 337
15.4.3 Graph Neural Networks for CTDGs . . . . . . . . . . . . . . . . . . 340
15.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
15.5.1 Skeleton-based Human Activity Recognition . . . . . . . . . . . 343
15.5.2 Traffic Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
15.5.3 Temporal Knowledge Graph Completion . . . . . . . . . . . . . . 346
15.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
16 Heterogeneous Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 351
Chuan Shi
16.1 Introduction to HGNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
16.1.1 Basic Concepts of Heterogeneous Graphs . . . . . . . . . . . . . 353
16.1.2 Challenges of HG Embedding . . . . . . . . . . . . . . . . . . . . . . . 354
16.1.3 Brief Overview of Current Development . . . . . . . . . . . . . . 355
Contents xxv

16.2 Shallow Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356


16.2.1 Decomposition-based Methods . . . . . . . . . . . . . . . . . . . . . . 357
16.2.2 Random Walk-based Methods . . . . . . . . . . . . . . . . . . . . . . . 358
16.3 Deep Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
16.3.1 Message Passing-based Methods (HGNNs) . . . . . . . . . . . . 360
16.3.2 Encoder-decoder-based Methods . . . . . . . . . . . . . . . . . . . . . 363
16.3.3 Adversarial-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . 364
16.4 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
16.5 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
16.5.1 Structures and Properties Preservation . . . . . . . . . . . . . . . . 367
16.5.2 Deeper Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
16.5.3 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
16.5.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
17 Graph Neural Networks: AutoML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Kaixiong Zhou, Zirui Liu, Keyu Duan and Xia Hu
17.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
17.1.1 Notations of AutoGNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
17.1.2 Problem Definition of AutoGNN . . . . . . . . . . . . . . . . . . . . . 375
17.1.3 Challenges in AutoGNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
17.2 Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
17.2.1 Architecture Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . 377
17.2.2 Training Hyperparameter Search Space . . . . . . . . . . . . . . . 380
17.2.3 Efficient Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
17.3 Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
17.3.1 Random Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
17.3.2 Evolutionary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
17.3.3 Reinforcement Learning Based Search . . . . . . . . . . . . . . . . 383
17.3.4 Differentiable Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
17.3.5 Efficient Performance Estimation . . . . . . . . . . . . . . . . . . . . 386
17.4 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
18 Graph Neural Networks: Self-supervised Learning . . . . . . . . . . . . . . . 391
Yu Wang, Wei Jin, and Tyler Derr
18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
18.2 Self-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
18.3 Applying SSL to Graph Neural Networks: Categorizing Training
Strategies, Loss Functions and Pretext Tasks . . . . . . . . . . . . . . . . . . . 395
18.3.1 Training Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
18.3.2 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
18.3.3 Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
18.4 Node-level SSL Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
18.4.1 Structure-based Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . 403
18.4.2 Feature-based Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 404
18.4.3 Hybrid Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
18.5 Graph-level SSL Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
xxvi Contents

18.5.1 Structure-based Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . 408


18.5.2 Feature-based Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 413
18.5.3 Hybrid Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
18.6 Node-graph-level SSL Pretext Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 417
18.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
18.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

Part IV Broad and Emerging Applications with Graph Neural Networks

19 Graph Neural Networks in Modern Recommender Systems . . . . . . . . 423


Yunfei Chu, Jiangchao Yao, Chang Zhou and Hongxia Yang
19.1 Graph Neural Networks for Recommender System in Practice . . . . 423
19.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
19.1.2 Classic Approaches to Predict User-Item Preference . . . . 428
19.1.3 Item Recommendation in user-item Recommender
Systems: a Bipartite Graph Perspective . . . . . . . . . . . . . . . 429
19.2 Case Study 1: Dynamic Graph Neural Networks Learning . . . . . . . 431
19.2.1 Dynamic Sequential Graph . . . . . . . . . . . . . . . . . . . . . . . . . 431
19.2.2 DSGL: Dynamic Sequential Graph Learning . . . . . . . . . . . 432
19.2.3 Model Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
19.2.4 Experiments and Discussions . . . . . . . . . . . . . . . . . . . . . . . . 436
19.3 Case Study 2: Device-Cloud Collaborative Learning for Graph
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
19.3.1 The proposed framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
19.3.2 Experiments and Discussions . . . . . . . . . . . . . . . . . . . . . . . . 442
19.4 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
20 Graph Neural Networks in Computer Vision . . . . . . . . . . . . . . . . . . . . 447
Siliang Tang, Wenqiao Zhang, Zongshen Mu, Kai Shen, Juncheng Li,
Jiacheng Li and Lingfei Wu
20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
20.2 Representing Vision as Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
20.2.1 Visual Node representation . . . . . . . . . . . . . . . . . . . . . . . . . 448
20.2.2 Visual Edge representation . . . . . . . . . . . . . . . . . . . . . . . . . . 450
20.3 Case Study 1: Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
20.3.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
20.3.2 Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
20.4 Case Study 2: Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
20.4.1 Video Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 454
20.4.2 Temporal Action Localization . . . . . . . . . . . . . . . . . . . . . . . 456
20.5 Other Related Work: Cross-media . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
20.5.1 Visual Caption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
20.5.2 Visual Question Answering . . . . . . . . . . . . . . . . . . . . . . . . . 458
20.5.3 Cross-Media Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
20.6 Frontiers for Graph Neural Networks on Computer Vision . . . . . . . 460
20.6.1 Advanced Graph Neural Networks for Computer Vision . 460
Contents xxvii

20.6.2 Broader Area of Graph Neural Networks on Computer


Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
20.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

21 Graph Neural Networks in Natural Language Processing . . . . . . . . . . 463


Bang Liu, Lingfei Wu
21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
21.2 Modeling Text as Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
21.2.1 Graph Representations in Natural Language Processing . . 466
21.2.2 Tackling Natural Language Processing Tasks from a
Graph Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
21.3 Case Study 1: Graph-based Text Clustering and Matching . . . . . . . 470
21.3.1 Graph-based Clustering for Hot Events Discovery and
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
21.3.2 Long Document Matching with Graph Decomposition
and Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
21.4 Case Study 2: Graph-based Multi-Hop Reading Comprehension . . 475
21.5 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
21.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
22 Graph Neural Networks in Program Analysis . . . . . . . . . . . . . . . . . . . . 483
Miltiadis Allamanis
22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
22.2 Machine Learning in Program Analysis . . . . . . . . . . . . . . . . . . . . . . . 484
22.3 A Graph Represention of Programs . . . . . . . . . . . . . . . . . . . . . . . . . . 486
22.4 Graph Neural Networks for Program Graphs . . . . . . . . . . . . . . . . . . 489
22.5 Case Study 1: Detecting Variable Misuse Bugs . . . . . . . . . . . . . . . . . 491
22.6 Case Study 2: Predicting Types in Dynamically Typed Languages . 493
22.7 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
23 Graph Neural Networks in Software Mining . . . . . . . . . . . . . . . . . . . . . 499
Collin McMillan
23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
23.2 Modeling Software as a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
23.2.1 Macro versus Micro Representations . . . . . . . . . . . . . . . . . 501
23.2.2 Combining the Macro- and Micro-level . . . . . . . . . . . . . . . 503
23.3 Relevant Software Mining Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
23.4 Example Software Mining Task: Source Code Summarization . . . . 504
23.4.1 Primer GNN-based Code Summarization . . . . . . . . . . . . . . 505
23.4.2 Directions for Improvement . . . . . . . . . . . . . . . . . . . . . . . . . 510
23.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
24 GNN-based Biomedical Knowledge Graph Mining in Drug
Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Chang Su, Yu Hou, Fei Wang
24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
xxviii Contents

24.2 Existing Biomedical Knowledge Graphs . . . . . . . . . . . . . . . . . . . . . . 518


24.3 Inference on Knowledge Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
24.3.1 Conventional KG inference techniques . . . . . . . . . . . . . . . . 523
24.3.2 GNN-based KG inference techniques . . . . . . . . . . . . . . . . . 524
24.4 KG-based hypothesis generation in computational drug
development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
24.4.1 A machine learning framework for KG-based drug
repurposing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
24.4.2 Application of KG-based drug repurposing in COVID-19 530
24.5 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
24.5.1 KG quality control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
24.5.2 Scalable inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
24.5.3 Coupling KGs with other biomedical data . . . . . . . . . . . . . 533
25 Graph Neural Networks in Predicting Protein Function and
Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Anowarul Kabir and Amarda Shehu
25.1 From Protein Interactions to Function: An Introduction . . . . . . . . . . 541
25.1.1 Enter Stage Left: Protein-Protein Interaction Networks . . 542
25.1.2 Problem Formulation(s), Assumptions, and Noise: A
Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
25.1.3 Shallow Machine Learning Models over the Years . . . . . . 543
25.1.4 Enter Stage Right: Graph Neural Networks . . . . . . . . . . . . 544
25.2 Highlighted Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
25.2.1 Case Study 1: Prediction of Protein-Protein and
Protein-Drug Interactions: The Link Prediction Problem . 547
25.2.2 Case Study 2: Prediction of Protein Function and
Functionally-important Residues . . . . . . . . . . . . . . . . . . . . . 549
25.2.3 Case Study 3: From Representation Learning to
Multirelational Link Prediction in Biological Networks
with Graph Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
25.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
26 Graph Neural Networks in Anomaly Detection . . . . . . . . . . . . . . . . . . . 557
Shen Wang, Philip S. Yu
26.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
26.2 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
26.2.1 Data-specific issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
26.2.2 Task-specific Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
26.2.3 Model-specific Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
26.3 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
26.3.1 Graph Construction and Transformation . . . . . . . . . . . . . . . 564
26.3.2 Graph Representation Learning . . . . . . . . . . . . . . . . . . . . . . 565
26.3.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
26.4 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
26.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
Contents xxix

26.5.1 Case Study 1: Graph Embeddings for Malicious


Accounts Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
26.5.2 Case Study 2: Hierarchical Attention Mechanism based
Cash-out User Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
26.5.3 Case Study 3: Attentional Heterogeneous Graph Neural
Networks for Malicious Program Detection . . . . . . . . . . . . 572
26.5.4 Case Study 4: Graph Matching Framework to Learn
the Program Representation and Similarity Metric
via Graph Neural Networks for Unknown Malicious
Program Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
26.5.5 Case Study 5: Anomaly Detection in Dynamic Graph
Using Attention-based Temporal GCN . . . . . . . . . . . . . . . . 575
26.5.6 Case Study 6: GCN-based Anti-Spam for Spam Review
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
26.6 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
27 Graph Neural Networks in Urban Intelligence . . . . . . . . . . . . . . . . . . . 579
Yanhua Li, Xun Zhou, and Menghai Pan
27.1 Graph Neural Networks for Urban Intelligence . . . . . . . . . . . . . . . . . 580
27.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
27.1.2 Application scenarios in urban intelligence . . . . . . . . . . . . 581
27.1.3 Representing urban systems as graphs . . . . . . . . . . . . . . . . 584
27.1.4 Case Study 1: Graph Neural Networksin urban
configuration and transportation . . . . . . . . . . . . . . . . . . . . . 586
27.1.5 Case Study 2: Graph Neural Networks in urban
anomaly and event detection . . . . . . . . . . . . . . . . . . . . . . . . 589
27.1.6 Case Study 3: Graph Neural Networks in urban human
behavior inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
27.1.7 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
Terminologies

This chapter describes a list of definitions of terminologies related to graph neural


networks used throughout this book.

1 Basic concepts of Graphs

• Graph: A graph is composed of a node set and an edge set, where nodes rep-
resent entities and edges represent the relationship between entities. The nodes
and edges form the topology structure of the graph. Besides the graph structure,
nodes, edges, and/or the whole graph can be associated with rich information
represented as node/edge/graph features (also known as attributes or contents).
• Subgraph: A subgraph is a graph whose set of nodes and set of edges are all
subsets of the original graph.
• Centrality: A centrality is a measurement of the importance of nodes in the
graph. The basic assumption of centrality is that a node is thought to be im-
portant if many other important nodes also connect to it. Common centrality
measurements include the degree centrality, the eigenvector centrality, the be-
tweenness centrality, and the closeness centrality.
• Neighborhood: The neighborhood of a node generally refers to other nodes that
are close to it. For example, the k-order neighborhood of a node, also called the
k-step neighborhood, denotes a set of other nodes in which the shortest path
distance between these nodes and the central node is no larger than k.
• Community Structure: A community refers to a group of nodes that are
densely connected internally and less densely connected externally.
• Graph Sampling: Graph sampling is a technique to pick a subset of nodes and/
or edges from the original graph. Graph sampling can be applied to train ma-
chine learning models on large-scale graphs while preventing severe scalability
issues.

xxxi
xxxii Terminologies

• Heterogeneous Graphs: Graphs are called heterogeneous if the nodes and/or


edges of the graph are from different types. A typical example of heteronomous
graphs is knowledge graphs where the edges are composed of different types.
• Hypergraphs: Hypergraphs are generalizations of graphs in which an edge can
join any number of nodes.
• Random Graph: Random graph generally aims to model the probability dis-
tributions over graphs that the observed graphs are generated from. The most
basic and well-studied random graph model, known as the Erdos–Renyi model,
assumes that the node set is fixed and each edge is identically and independently
generated.
• Dynamic Graph: Dynamic graph refers to when at least one component of the
graph data changes over time, e.g., adding or deleting nodes, adding or deleting
edges, changing edges weights or changing node attributes, etc. If graphs are
not dynamic, we refer to them as static graphs.

2 Machine Learning on Graphs

• Spectral Graph Theory: Spectral graph theory analyzes matrices associated


with the graph such as its adjacency matrix or Laplacian matrix using tools of
linear algebra such as studying the eigenvalues and eigenvectors of the matrix.
• Graph Signal Processing: Graph Signal Processing (GSP) aims to develop
tools for processing signals defined on graphs. A graph signal refers to a finite
collection of data samples with one sample at each node in the graph.
• Node-level Tasks: Node-level tasks refer to machine learning tasks associated
with individual nodes in the graph. Typical examples of node-level tasks include
node classification and node regression.
• Edge-level Tasks: Edge-level tasks refer to machine learning tasks associated
with a pair of nodes in the graph. A typical example of an edge-level task in
link prediction.
• Graph-level Tasks: Graph-level tasks refer to machine learning tasks associ-
ated with the whole graph. Typical examples of graph-level tasks include graph
classification and graph property prediction.
• Transductive and Inductive Learning: Transductive learning refers to that
the targeted instances such as nodes or edges are observed at the training time
(though the labels of the targeted instances remain unknown) and inductive
learning aims to learn the model which is generalizable to unobserved instances.

3 Graph Neural Networks

• Network embedding: The goal of network embedding is to represent each node


in the graph as a low-dimensional vector so that useful information such as the
Terminologies xxxiii

graph structures and some properties of the graph is preserved in the embedding
vectors. Network embedding is also referred to as graph embedding and node
representation learning.
• Graph Neural Network: Graph neural network refers to any neural network
working on the graph data.
• Graph Convolutional Network: Graph convolutional network usually refers to
a specific graph neural network proposed by Kipf and Welling Kipf and Welling
(2017a). It is occasionally used as a synonym for graph neural network, i.e.,
referring to any neural network working on the graph data, in some literature.
• Message-Passing: Message-passing is a framework of graph neural networks in
which the key step is to pass messages between different nodes based on graph
structures in each neural network layer. The most widely adopted formulation,
usually denoted as message-passing neural networks, is to only pass messages
between nodes that are directly connected Gilmer et al (2017). The message
passing functions are also called graph filters and graph convolutions in some
literature.
• Readout: Readout refers to functions that summarize the information of indi-
vidual nodes to form more high-level information such as forming a subgraph/super-
graph or obtaining the representations of the entire graph. Readout is also called
pooling and graph coarsening in some literature.
• Graph Adversarial Attack: Graph adversarial attacks aim to generate worst-
case perturbations by manipulating the graph structure and/or node features so
that the performance of some models are downgraded. Graph adversarial attacks
can be categorized based on the attacker’s goals, capabilities, and accessible
knowledge.
• Robustness certificates: Methods providing formal guarantees that the predic-
tion of a GNN is not affected even when perturbations are performed based on
a certain perturbation model.
Notations

This Chapter provides a concise reference that describes the notations used through-
out this book.

Numbers, Arrays, and Matrices

A scalar x
A vector x
A matrix X
An identity matrix I
The set of real numbers R
The set of complex numbers C
The set of integers Z
The set of real n-length vectors Rn
The set of real m × n matrices Rm×n
The real interval including a and b [a, b]
The real interval including a but excluding b [a, b)
The element of the vector x with index i xi
The element of matrix X’s indexed by Row i and Column j Xi, j

Graph Basics

A graph G
Edge set E
Vertex set V
Adjacent matrix of a graph A
Laplacian matrix L
Diagonal degree matrix D
Isomorphism between graphs G and H G∼=H
H is a subgraph of graph G H ⊆G
H is a proper subgraph of graph G H ⊂G
Union of graphs H and G G ∪H

xxxv
xxxvi Notations

Intersection of graphs H and G G ∩H


Disjoint Union of graphs H and G G +H
Cartesian Product of graphs of graphs H and G G ×H
The join of graphs H and G G ∨H

Basic Operations

Transpose of matrix X X⊤
Dot product of matrices X and Y X ·Y or XY
Element-wise (Hadamard) product of matrices X and Y X ⊙Y
Determinant of X det(X)
p-norm (also called ℓ p norm) of x ∥x∥ p
Union ∪
Intersection ∩
Subset ⊆
Proper subset ⊂
Inner prodct of vector x and y < x, y >

Functions

The function f with domain A and range B f :A→B


dy
Derivative of y with respect to x dx
∂y
Partial derivative of y with respect to x ∂x
Gradient of y with respect to x ∇x y
Matrix derivatives of y with respect to matrix X ∇X y
The Hessian matrix of function f at input vector x ∇2 f (x)
R
Definite integral over the entire domain of x R
f (x)dx
Definite integral with respect to x over the set S S f (x)dx
A function of x parametrized by θ f (x; θ )
Convolution between functions f and g f ∗g

Probablistic Theory

A probability distribution of a p(a)


A conditional probabilistic distribution of b given a p(b|a)
The random variables a and b are independent a⊥b
Variables a and b are conditionally independent given c a⊥b | c
Random variable a has a distribution p a∼ p
The expectation of f (a) with respect to the variable a under distri- Ea∼p [ f (a)]
bution p
Gaussian distribution over x with mean µ and covariance Σ N (x; µ, Σ )
Part I
Introduction
Chapter 1
Representation Learning

Liang Zhao, Lingfei Wu, Peng Cui and Jian Pei

Abstract In this chapter, we first describe what representation learning is and why
we need representation learning. Among the various ways of learning representa-
tions, this chapter focuses on deep learning methods: those that are formed by the
composition of multiple non-linear transformations, with the goal of resulting in
more abstract and ultimately more useful representations. We summarize the repre-
sentation learning techniques in different domains, focusing on the unique chal-
lenges and models for different data types including images, natural languages,
speech signals and networks. Last, we summarize this chapter.

1.1 Representation Learning: An Introduction

The effectiveness of machine learning techniques heavily relies on not only the de-
sign of the algorithms themselves, but also a good representation (feature set) of
data. Ineffective data representations that lack some important information or con-
tains incorrect or huge redundant information could lead to poor performance of
the algorithm in dealing with different tasks. The goal of representation learning is
to extract sufficient but minimal information from data. Traditionally, this can be
achieved via human efforts based on the prior knowledge and domain expertise on
the data and tasks, which is also named as feature engineering. In deploying ma-

Liang Zhao
Department of Computer Science, Emory University, e-mail: [email protected]
Lingfei Wu
JD.COM Silicon Valley Research Center, e-mail: [email protected]
Peng Cui
Department of Computer Science, Tsinghua University, e-mail: [email protected]
Jian Pei
Department of Computer Science, Simon Fraser University, e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 3
L. Wu et al. (eds.), Graph Neural Networks: Foundations, Frontiers, and Applications,
https://doi.org/10.1007/978-981-16-6054-2_1
4 Liang Zhao, Lingfei Wu, Peng Cui and Jian Pei

chine learning and many other artificial intelligence algorithms, historically a large
portion of the human efforts goes into the design of prepossessing pipelines and data
transformations. More specifically, feature engineering is a way to take advantage
of human ingenuity and prior knowledge in the hope to extract and organize the dis-
criminative information from the data for machine learning tasks. For example, po-
litical scientists may be asked to define a keyword list as the features of social-media
text classifiers for detecting those texts on societal events. For speech transcription
recognition, one may choose to extract features from raw sound waves by the op-
erations including Fourier transformations. Although feature engineering is widely
adopted over the years, its drawbacks are also salient, including: 1) Intensive labors
from domain experts are usually needed. This is because feature engineering may
require tight and extensive collaboration between model developers and domain ex-
perts. 2) Incomplete and biased feature extraction. Specifically, the capacity and
discriminative power of the extracted features are limited by the knowledge of dif-
ferent domain experts. Moreover, in many domains that human beings have limited
knowledge, what features to extract itself is an open questions to domain experts,
such as cancer early prediction. In order to avoid these drawbacks, making learn-
ing algorithms less dependent on feature engineering has been a highly desired goal
in machine learning and artificial intelligence domains, so that novel applications
could be constructed faster and hopefully addressed more effectively.
The techniques of representation learning witness the development from the tra-
ditional representation learning techniques to more advanced ones. The traditional
methods belong to “shallow” models and aim to learn transformations of data that
make it easier to extract useful information when building classifiers or other pre-
dictors, such as Principal Component Analysis (PCA) (Wold et al, 1987), Gaussian
Markov random field (GMRF) (Rue and Held, 2005), and Locality Preserving Pro-
jections (LPP) (He and Niyogi, 2004). Deep learning-based representation learning
is formed by the composition of multiple non-linear transformations, with the goal
of yielding more abstract and ultimately more useful representations. In the light of
introducing more recent advancements and sticking to the major topic of this book,
here we majorly focus on deep learning-based representation learning, which can
be categorized into several types: (1) Supervised learning, where a large number of
labeled data are needed for the training of the deep learning models. Given the well-
trained networks, the output before the last fully-connected layers is always utilized
as the final representation of the input data; (2) Unsupervised learning (including
self-supervised learning), which facilitates the analysis of input data without corre-
sponding labels and aims to learn the underlying inherent structure or distribution
of data. The pre-tasks are utilized to explore the supervision information from large
amounts of unlabelled data. Based on this constructed supervision information, the
deep neural networks are trained to extract the meaningful representations for the
future downstream tasks; (3) Transfer learning, which involves methods that utilize
any knowledge resource (i.e., data, model, labels, etc.) to increase model learning
and generalization for the target task. Transfer learning encompasses different sce-
narios including multi-task learning (MTL), model adaptation, knowledge transfer,
co-variance shift, etc. There are also other important representation learning meth-
1 Representation Learning 5

ods such as reinforcement learning, few-shot learning, and disentangled representa-


tion learning.
It is important to define what is a good representation. As the definition by Ben-
gio (2008), representation learning is about learning the (underlying) features of
the data that make it easier to extract useful information when building classifiers or
other predictors. Thus, the evaluation of a learned representation is closely related to
its performance on the downstream tasks. For example, in the data generation task
based on a generative model, a good representation is often the one that captures
the posterior distribution of the underlying explanatory factors for the observed in-
put. While for a prediction task, a good representation is the one that captures the
minimal but sufficient information of input data to correctly predict the target label.
Besides the evaluation from the perspective of the downstream tasks, there are also
some general properties that the good representations may hold, such as the smooth-
ness, the linearity, capturing multiple explanatory and casual factors, holding shared
factors across different tasks and simple factor dependencies.

1.2 Representation Learning in Different Areas

In this section, we summarize the development of representation learning on four


different representative areas: (1) image processing; (2) speech recognition; (3) Nat-
ural language processing; and (4) network analysis. For the representation learning
in each research area, we consider some of the fundamental questions that have been
driving research in this area. Specifically, what makes one representation better than
another, and how should we compute its representation? Why is the representation
learning important in that area? Also, what are appropriate objectives for learning
good representations? We also introduce the relevant typical methods and their de-
velopment from the perspective of three main categories: supervised representation
learning, unsupervised learning and transfer learning, respectively.

1.2.1 Representation Learning for Image Processing

Image representation learning is a fundamental problem in understanding the se-


mantics of various visual data, such as photographs, medical images, document
scans, and video streams. Normally, the goal of image representation learning for
image processing is to bridge the semantic gap between the pixel data and semantics
of the images. The successful achievements of image representation learning have
enpowered many real-world problems, including but not limited to image search,
facial recognition, medical image analysis, photo manipulation and target detection.
In recent years, we have witnessed a fast advancement of image representation
learning from handcrafted feature engineering to that from scratch through deep
neural network models. Traditionally, the patterns of images are extracted with the
6 Liang Zhao, Lingfei Wu, Peng Cui and Jian Pei

help of hand-crafted features by human beings based on prior knowledge. For exam-
ple, Huang et al (2000) extracted the character’s structure features from the strokes,
then use them to recognize the handwritten characters. Rui (2005) adopted the mor-
phology method to improve local feature of the characters, then use PCA to ex-
tract features of characters. However, all of these methods need to extract features
from images manually and thus the prediction performances strongly rely on the
prior knowledge. In the field of computer vision, manual feature extraction is very
cumbersome and impractical because of the high dimensionality of feature vec-
tors. Thus, representation learning of images which can automatically extract mean-
ingful, hidden and complex patterns from high-dimension visual data is necessary.
Deep learning-based representation learning for images is learned in an end-to-end
fashion, which can perform much better than hand-crafted features in the target ap-
plications, as long as the training data is of sufficient quality and quantity.
Supervised Representation Learning for image processing. In the domain of im-
age processing, supervised learning algorithm, such as Convolution Neural Network
(CNN) and Deep Belief Network (DBN), are commonly applied in solving various
tasks. One of the earliest deep-supervised-learning-based works was proposed in
2006 (Hinton et al, 2006), which is focused on the MNIST digit image classifica-
tion problem, outperforming the state-of-the-art SVMs. Following this, deep convo-
lutional neural networks (ConvNets) showed amazing performance which is greatly
depends on their properties of shift in-variance, weights sharing and local pattern
capturing. Different types of network architectures were developed to increase the
capacity of network models, and larger and larger datasets were collected these days.
Various networks including AlexNet (Krizhevsky et al, 2012), VGG (Simonyan and
Zisserman, 2014b), GoogLeNet (Szegedy et al, 2015), ResNet (He et al, 2016a),
and DenseNet (Huang et al, 2017a) and large scale datasets, such as ImageNet and
OpenImage, have been proposed to train very deep convolutional neural networks.
With the sophisticated architectures and large-scale datasets, the performance of
convolutional neural networks keeps outperforming the state-of-the-arts in various
computer vision tasks.
Unsupervised Representation Learning for image processing. Collection and an-
notation of large-scale datasets are time-consuming and expensive in both image
datasets and video datasets. For example, ImageNet contains about 1.3 million la-
beled images covering 1,000 classes while each image is labeled by human workers
with one class label. To alleviate the extensive human annotation labors, many unsu-
pervised methods were proposed to learn visual features from large-scale unlabeled
images or videos without using any human annotations. A popular solution is to
propose various pretext tasks for models to solve, while the models can be trained
by learning objective functions of the pretext tasks and the features are learned
through this process. Various pretext tasks have been proposed for unsupervised
learning, including colorizing gray-scale images (Zhang et al, 2016d) and image in-
painting (Pathak et al, 2016). During the unsupervised training phase, a predefined
pretext task is designed for the models to solve, and the pseudo labels for the pretext
task are automatically generated based on some attributes of data. Then the models
are trained according to the objective functions of the pretext tasks. When trained
1 Representation Learning 7

with pretext tasks, the shallower blocks of the deep neural network models focus on
the low-level general features such as corners, edges, and textures, while the deeper
blocks focus on the high-level task-specific features such as objects, scenes, and
object parts. Therefore, the models trained with pretext tasks can learn kernels to
capture low-level features and high-level features that are helpful for other down-
stream tasks. After the unsupervised training is finished, the learned visual features
in this pre-trained models can be further transferred to downstream tasks (especially
when only relatively small data is available) to improve performance and overcome
over-fitting.
Transfer Learning for image processing. In real-world applications, due to the
high cost of manual labeling, sufficient training data that belongs to the same fea-
ture space or distribution as the testing data may not always be accessible. Transfer
learning mimics the human vision system by making use of sufficient amounts of
prior knowledge in other related domains (i.e., source domains) when executing
new tasks in the given domain (i.e., target domain). In transfer learning, both the
training set and the test set can contribute to the target and source domains. In most
cases, there is only one target domain for a transfer learning task, while either single
or multiple source domains can exist. The techniques of transfer learning in im-
ages processing can be categorized into feature representation knowledge transfer
and classifier-based knowledge transfer. Specifically, feature representation trans-
fer methods map the target domain to the source domains by exploiting a set of
extracted features, where the data divergence between the target domain and the
source domains can be significantly reduced so that the performance of the task
in the target domain is improved. For example, classifier-based knowledge-transfer
methods usually share the common trait that the learned source domain models are
utilized as prior knowledge, which are used to learn the target model together with
the training samples. Instead of minimizing the cross-domain dissimilarity by up-
dating instances’ representations, classifier-based knowledge-transfer methods aim
to learn a new model that minimizes the generalization error in the target domain
via the provided training set from both domains and the learned model.
Other Representation Learning for Image Processing. Other types of representa-
tion learning are also commonly observed for dealing with image processing, such
as reinforcement learning, and semi-supervised learning. For example, reinforce-
ment learning are commonly explored in the task of image captioning Liu et al
(2018a); Ren et al (2017) and image editing Kosugi and Yamasaki (2020), where
the learning process is formalized as a sequence of actions based on a policy net-
work.
8 Liang Zhao, Lingfei Wu, Peng Cui and Jian Pei

1.2.2 Representation Learning for Speech Recognition

Nowadays, speech interfaces or systems have become widely developed and inte-
grated into various real-life applications and devices. Services like Siri 1 , Cortana 2 ,
and Google Voice Search 3 have become a part of our daily life and are used by mil-
lions of users. The exploration in speech recognition and analysis has always been
motivated by a desire to enable machines to participate in verbal human-machine
interactions. The research goals of enabling machines to understand human speech,
identify speakers, and detect human emotion have attracted researchers’ attention
for more than sixty years across several distinct research areas, including but not
limited to Automatic Speech Recognition (ASR), Speaker Recognition (SR), and
Speaker Emotion Recognition (SER).
Analyzing and processing speech has been a key application of machine learning
(ML) algorithms. Research on speech recognition has traditionally considered the
task of designing hand-crafted acoustic features as a separate distinct problem from
the task of designing efficient models to accomplish prediction and classification
decisions. There are two main drawbacks of this approach: First, the feature engi-
neering is cumbersome and requires human knowledge as introduced above; and
second, the designed features might not be the best for the specific speech recog-
nition tasks at hand. This has motivated the adoption of recent trends in the speech
community towards the utilization of representation learning techniques, which can
learn an intermediate representation of the input signal automatically that better fits
into the task at hand and hence lead to improved performance. Among all these suc-
cesses, deep learning-based speech representations play an important role. One of
the major reasons for the utilization of representation learning techniques in speech
technology is that speech data is fundamentally different from two-dimensional im-
age data. Images can be analyzed as a whole or in patches, but speech has to be
formatted sequentially to capture temporal dependency and patterns.
Supervised representation learning for speech recognition. In the domain of
speech recognition and analyzing, supervised representation learning methods are
widely employed, where feature representations are learned on datasets by leverag-
ing label information. For example, restricted Boltzmann machines (RBMs) (Jaitly
and Hinton, 2011; Dahl et al, 2010) and deep belief networks (DBNs) (Cairong
et al, 2016; Ali et al, 2018) are commonly utilized in learning features from speech
for different tasks, including ASR, speaker recognition, and SER. For example,
in 2012, Microsoft has released a new version of their MAVIS (Microsoft Audio
Video Indexing Service) speech system based on context-dependent deep neural net-
works (Seide et al, 2011). These authors managed to reduce the word error rate on
four major benchmarks by about 30% (e.g., from 27.4% to 18.5% on RT03S) com-
1 Siri is an artificial intelligence assistant software that is built into Apple’s iOS system.
2 Microsoft Cortana is an intelligent personal assistant developed by Microsoft, known as ”the
world’s first cross-platform intelligent personal assistant”.
3 Google Voice Search is a product of Google that allows you to use Google to search by speaking

to a mobile phone or computer, that is, to use the legendary content on the device to be identified
by the server, and then search for information based on the results of the recognition
1 Representation Learning 9

pared to the traditional models based on Gaussian mixtures. Convolutional neural


networks are another popular supervised models that are widely utilized for feature
learning from speech signals in tasks such as speech and speaker recognition (Palaz
et al, 2015a,b) and SER Latif et al (2019); Tzirakis et al (2018). Moreover, it has
been found that LSTMs (or GRUs) can help CNNs in learning more useful features
from speech by learning both the local and long-term dependency (Dahl et al, 2010).
Unsupervised Representation Learning for speech recognition. Unsupervised
representation learning from large unlabelled datasets is an active area of speech
recognition. In the context of speech analysis, it is able to exploit the practically
available unlimited amount of unlabelled corpora to learn good intermediate feature
representations, which can then be used to improve the performance of a variety of
downstream supervised learning speech recognition tasks or the speech signal syn-
thetic tasks. In the tasks of ASR and SR, most of the works are based on Variational
Auto-encoder (VAEs), where a generative model and an inference model are jointly
learned, which allows them to capture latent representations from observed speech
data (Chorowski et al, 2019; Hsu et al, 2019, 2017). For example, Hsu et al (2017)
proposed a hierarchical VAE to capture interpretable and disentangled representa-
tions from speech without any supervision. Other auto-encoding architectures like
Denoised Autoencoder(DAEs) are also found very promising in finding speech rep-
resentations in an unsupervised way, especially for noisy speech recognition (Feng
et al, 2014; Zhao et al, 2015). Beyond the aforementioned, recently, adversarial
learning (AL) is emerging as a powerful tool in learning unsupervised represen-
tation for speech, such as generative adversarial nets (GANs). It involves at least
a generator and a discriminator, where the former tries to generates as realistic as
possible data to obfuscate the latter which also tries its best to deobfuscate. Hence
both of the generator and discriminator can be trained and improved iteratively in
an adversarial way, which result in more discriminative and robust features. Among
these, GANs (Chang and Scherer, 2017; Donahue et al, 2018), adversarial autoen-
coders (AAEs) Sahu et al (2017) are becoming mostly popular in modeling speech
not only in ASR but also SR and SER.
Transfer Learning for speech recognition. Transfer learning (TL) encompasses
different approaches, including MTL, model adaptation, knowledge transfer, covari-
ance shift, etc. In the domain of speech recognition, representation learning gained
much interest in these approaches of TL including but not limited to domain adap-
tation, multi-task learning, and self-taught learning. In terms of Domain Adaption,
speech is a typical example of heterogeneous data and thus, a mismatch always ex-
ists between the probability distributions of source and target domain data. To build
more robust systems for speech-related applications in real-life, domain adaptation
techniques are usually applied in the training pipeline of deep neural networks to
learn representations which are able to explicitly minimize the difference between
the distribution of data in the source and target domains (Sun et al, 2017; Swietojan-
ski et al, 2016). In terms of MTL, representations learned can successfully increases
the performance of speech recognition without requiring contextual speech data,
since speech contains multi-dimensional information (message, speaker, gender, or
emotion) that can be used as auxiliary tasks. For example, In the task of ASR, by us-
10 Liang Zhao, Lingfei Wu, Peng Cui and Jian Pei

ing MTL with different auxiliary tasks including gender, speaker adaptation, speech
enhancement, it has been shown that the learned shared representations for differ-
ent tasks can act as complementary information about the acoustic environment and
give a lower word error rate (WER) (Parthasarathy and Busso, 2017; Xia and Liu,
2015).
Other Representation Learning for speech recognition. Other than the above-
mentioned three categories of representation learning for speech signals, there are
also some other representation learning techniques commonly explored, such as
semi-supervised learning and reinforcement learning. For example, in the speech
recognition for ASR, semi-supervised learning is mainly used to circumvent the lack
of sufficient training data. This can be achieved either by creating features fronts
ends (Thomas et al, 2013), or by using multilingual acoustic representations (Cui
et al, 2015), or by extracting an intermediate representation from large unpaired
datasets (Karita et al, 2018). RL is also gaining interest in the area of speech recog-
nition, and there have been multiple approaches to model different speech problems,
including dialog modeling and optimization (Levin et al, 2000), speech recogni-
tion (Shen et al, 2019), and emotion recognition (Sangeetha and Jayasankar, 2019).

1.2.3 Representation Learning for Natural Language Processing

Besides speech recognition, there are many other Natural Language Processing
(NLP) applications of representation learning, such as the text representation learn-
ing. For example, Google’s image search exploits huge quantities of data to map im-
ages and queries in the same space (Weston et al, 2010) based on NLP techniques.
In general, there are two types of applications of representation learning in NLP.
In one type, the semantic representation, such as the word embedding, is trained
in a pre-training task (or directly designed by human experts) and is transferred to
the model for the target task. It is trained by using language modeling objective
and is taken as inputs for other down-stream NLP models. In the other type, the
semantic representation lies within the hidden states of the deep learning model and
directly aims for better performance of the target tasks in an end-to-end fashion. For
example, many NLP tasks want to semantically compose sentence or document rep-
resentation, such as tasks like sentiment classification, natural language inference,
and relation extraction, which require sentence representation.
Conventional NLP tasks heavily rely on feature engineering, which requires care-
ful design and considerable expertise. Recently, representation learning, especially
deep learning-based representation learning is emerging as the most important tech-
nique for NLP. First, NLP is typically concerned with multiple levels of language en-
tries, including but not limited to characters, words, phrases, sentences, paragraphs,
and documents. Representation learning is able to represent the semantics of these
multi-level language entries in a unified semantic space, and model complex se-
mantic dependence among these language entries. Second, there are various NLP
tasks that can be conducted on the same input. For example, given a sentence, we
1 Representation Learning 11

can perform multiple tasks such as word segmentation, named entity recognition,
relation extraction, co-reference linking, and machine translation. In this case, it
will be more efficient and robust to build a unified representation space of inputs
for multiple tasks. Last, natural language texts may be collected from multiple do-
mains, including but not limited to news articles, scientific articles, literary works,
advertisement and online user-generated content such as product reviews and so-
cial media. Moreover, texts can also be collected from different languages, such as
English, Chinese, Spanish, Japanese, etc. Compared to conventional NLP systems
which have to design specific feature extraction algorithms for each domain accord-
ing to its characteristics, representation learning enables us to build representations
automatically from large-scale domain data and even add bridges among these lan-
guages from different domains. Given these advantages of representation learning
for NLP in the feature engineering reduction and performance improvement, many
researchers have developed efficient algorithms on representation learning, espe-
cially deep learning-based approaches, for NLP.
Supervised Representation Learning for NLP. Deep neural networks in the su-
pervised learning setting for NLP emerge from distributed representation learning,
then to CNN models, and finally to RNN models in recent years. At early stage,
distributed representations are first developed in the context of statistical language
modeling by Bengio (2008) in so-called neural net language models. The model
is about learning a distributed representation for each word (i.e., word embedding).
Following this, the need arose for an effective feature function that extracts higher-
level features from constituting words or n-grams. CNNs turned out to be the nat-
ural choice given their properties of excellent performance in computer vision and
speech processing tasks. CNNs have the ability to extract salient n-gram features
from the input sentence to create an informative latent semantic representation of
the sentence for downstream tasks. This domain was pioneered by Collobert et al
(2011) and Kalchbrenner et al (2014), which led to a huge proliferation of CNN-
based networks in the succeeding literature. The neural net language model was also
improved by adding recurrence to the hidden layers (Mikolov et al, 2011a) (i.e.,
RNN), allowing it to beat the state-of-the-art (smoothed n-gram models) not only in
terms of perplexity (exponential of the average negative log-likelihood of predicting
the right next word) but also in terms of WER in speech recognition. RNNs use
the idea of processing sequential information. The term “recurrent” applies as they
perform the same computation over each token of the sequence and each step is de-
pendent on the previous computations and results. Generally, a fixed-size vector is
produced to represent a sequence by feeding tokens one by one to a recurrent unit. In
a way, RNNs have “memory” over previous computations and use this information
in current processing. This template is naturally suited for many NLP tasks such
as language modeling (Mikolov et al, 2010, 2011b), machine translation (Liu et al,
2014; Sutskever et al, 2014), and image captioning (Karpathy and Fei-Fei, 2015).
Unsupervised Representation Learning for NLP. Unsupervised learning (includ-
ing self-supervised learning) has made a great success in NLP, for the plain text itself
contains abundant knowledge and patterns about languages. For example, in most
deep learning based NLP models, words in sentences are first mapped to their corre-
12 Liang Zhao, Lingfei Wu, Peng Cui and Jian Pei

sponding embeddings via the techniques, such as word2vec Mikolov et al (2013b),


GloVe Pennington et al (2014), and BERT Devlin et al (2019), before sending to
the networks. However, there are no human-annotated “labels” for learning those
word embeddings. To acquire the training objective necessary for neural networks,
it is necessary to generate “labels” intrinsically from the existing data. Language
modeling is a typical unsupervised learning task, which can construct the probabil-
ity distribution over sequences of words and does not require human annotations.
Based on the distributional hypothesis, using the language modeling objective can
lead to hidden representations that encode the semantics of words. Another typi-
cal unsupervised learning model in NLP is auto-encoder (AE), which consists of
a reduction (encoding) phase and a reconstruction (decoding) phase. For example,
recursive auto-encoders (which generalize recurrent networks with VAE) have been
used to beat the state-of-the-art at the moment of its publication in full sentence
paraphrase detection (Socher et al, 2011) by almost doubling the F1 score for para-
phrase detection.
Transfer Learning for NLP. Over the recent years, the field of NLP has wit-
nessed fast growth of transfer learning methods via sequential transfer learning
models and architectures, which significantly improved upon the state-of-the-arts
on a wide range of NLP tasks. In terms of domain adaption, the sequential transfer
learning consists of two stages: a pretraining phase in which general representa-
tions are learned on a source task or domain followed by an adaptation phase during
which the learned knowledge is applied to a target task or domain. The domain adap-
tion in NLP is categorized into model-centric, data-centric, and hybrid approaches.
Model-centric methods target the approaches to augmenting the feature space, as
well as altering the loss function, the architecture, or the model parameters (Blitzer
et al, 2006). Data-centric methods focus on the data aspect and involve pseudo-
labeling (or bootstrapping) where only small number of classes are shared between
the source and target datasets (Abney, 2007). Lastly, hybrid-based methods are built
by both data- and model-centric models. Similarly, great advances have also been
made into the multi-task learning in NLP, where different NLP tasks can result in
better representation of texts. For example, based on a convolutional architecture,
Collobert et al (2011) developed the SENNA system that shares representations
across the tasks of language modeling, part-of-speech tagging, chunking, named en-
tity recognition, semantic role labeling, and syntactic parsing. SENNA approaches
or sometimes even surpasses the state-of-the-art on these tasks while is simpler and
much faster than traditional predictors. Moreover, learning word embeddings can be
combined with learning image representations in a way that allow associating texts
and images.
Other Representation Learning for NLP. In NLP tasks, when a problem gets
more complicated, it requires more knowledge from domain experts to annotate
training instances for fine-grained tasks and thus increases the cost of data labeling.
Therefore, sometimes it requires the models or systems can be developed efficiently
with (very) few labeled data. When each class has only one or a few labeled in-
stances, the problem becomes a one/few-shot learning problem. The few-shot learn-
ing problem is derived from computer vision and has also been studied in NLP
1 Representation Learning 13

recently. For example, researchers have explored few-shot relation extractio (Han
et al, 2018) where each relation has a few labeled instances, and low-resource ma-
chine translation (Zoph et al, 2016) where the size of the parallel corpus is limited.

1.2.4 Representation Learning for Networks

Beyond popular data like images, texts, and sounds, network data is another im-
portant data type that is becoming ubiquitous across a large scale of real-world ap-
plications ranging from cyber-networks (e.g., social networks, citation networks,
telecommunication networks, etc.) to physical networks (e.g., transportation net-
works, biological networks, etc). Networks data can be formulated as graphs math-
ematically, where vertices and their relationships jointly characterize the network
information. Networks and graphs are very powerful and flexible data formulation
such that sometimes we could even consider other data types like images, and texts
as special cases of it. For example, images can be considered as grids of nodes with
RGB attributes which are special types of graphs, while texts can also be organized
into sequential-, tree-, or graph-structured information. So in general, representa-
tion learning for networks is widely considered as a promising yet more challenging
tasks that require the advancement and generalization of many techniques we devel-
oped for images, texts, and so forth. In addition to the intrinsic high complexity of
network data, the efficiency of representation learning on networks is also an impor-
tant issues considering the large-scale of many real-world networks, ranging from
hundreds to millions or even billions of vertices. Analyzing information networks
plays a crucial role in a variety of emerging applications across many disciplines.
For example, in social networks, classifying users into meaningful social groups is
useful for many important tasks, such as user search, targeted advertising and recom-
mendations; in communication networks, detecting community structures can help
better understand the rumor spreading process; in biological networks, inferring in-
teractions between proteins can facilitate new treatments for diseases. Nevertheless,
efficient and effective analysis of these networks heavily relies on good representa-
tions of the networks.
Traditional feature engineering on network data usually focuses on obtaining a
number of predefined straightforward features in graph levels (e.g., the diameter,
average path length, and clustering co-efficient), node levels (e.g., node degree and
centrality), or subgraph levels (e.g., frequent subgraphs and graph motifs). Those
limited number of hand-crafted, well-defined features, though describe several fun-
damental aspects of the graphs, discard the patterns that cannot be covered by them.
Moreover, real-world network phenomena are usually highly complicated require
sophisticated, unknown combinations among those predefined features or cannot be
characterized by any of the existing features. In addition, traditional graph feature
engineering usually involve expensive computations with super-linear or exponen-
tial complexity, which often makes many network analytic tasks computationally
expensive and intractable over large-scale networks. For example, in dealing with
Exploring the Variety of Random
Documents with Different Content
Sir Fletcher Portwood.
You’ve been sleeping, sir; your manners are appalling.
Claude.
[Stupidly.] Where’s aunt?
Sir Fletcher Portwood.
[Leading him towards the door.] In the next room. Come, sir! You
are deficient in tact, delicacy——

[John re-enters. Sir Fletcher passes him and goes out.


Claude.
[As he passes John.] The dining-room?
John.
[To Claude.] I shan’t keep you more than a minute or two.
Claude.
[In the doorway, turning to John.] Allingham, of course you and I
can never again be the same to each other as we have been in the
past; but may I take the liberty of foraging for a piece of cake?
John.
[Laying a hand on his shoulder.] Certainly.

[Claude goes out; John closes the door and turns to


Olive.
Olive.
[Facing him.] Well?
John.
[Advancing to her.] Well?
Olive.
Oh, could anything be clearer? It’s easy enough now to see
through the twaddle these people have been talking! Mrs. Fraser
runs away from her husband, who believes her guilty; her relatives
go in pursuit; they look for her and find her—where?
John.
Her relations chance to be here when Mrs. Fraser sends for me
——
Olive.
[Mockingly.] Yes!
John.
[Referring to the letter.] Desiring to see me “for a few moments,
upon a matter of business.” That is all that can be made of it.
Olive.
A matter of business!
John.
This letter is not quite ingenuous, you infer.
Olive.
You’ve caught the tone of the lawyers exactly.
John.
[Hotly.] “A matter of business” is a lie, you mean?
Olive.
Her arrival to-night is a remarkable coincidence.
John.
A perfectly natural one.
Olive.
Why are you so eager, then, to avoid granting her the interview
she asks for?
John.
Eager——!
Olive.
You send word to her that it’s impossible.
John.
Don’t you make it impossible?
Olive.
No, I do not; I do not. I want you to meet her to-night; you’ve
heard me say I wish it.
John.
You mean that?
Olive.
If ever I meant anything in my life.
John.
[Referring to the letter.] “I shall plant myself at some quiet spot
near your cottage——”
Olive.
Ah, no! never mind the quiet spot near the cottage. Why can’t you
have your business interview here?
John.
Here?
Olive.
[In a low voice, her head drooping.] Where we are now, while I—
[glancing towards the library]—while I take my place in there?

[There is a pause; he stands looking at her for a


moment silently.
John.
And this is how you propose to carry out your undertaking to
make amends to Mrs. Fraser?
[He turns away from her.
Olive.
Everything is altered since—since——
John.
Since we were reconciled! reconciled!
Olive.
Since I promised to aid Mrs. Fraser. The arrival of these people—
that letter—has undone everything. [Throwing herself upon the
settee despairingly.] Oh, they knew well enough where their bird
would fly to! [Burying her face in the pillows.] Oh, John, you’ll kill
me!
John.
Ha! and so you would like to try Mrs. Fraser twice in one day! And
there would be no mistake this time, no doubt whatever! Innocent
or Guilty—guilty for choice!
Olive.
No, no, innocent. But I want to be satisfied. Only satisfy me?
John.
Satisfy you! My heavens!
Olive.
Satisfy me! satisfy me!
John.
And what a model judge of this lady you would make, of any
woman you are jealous of! How scrupulously fair! how impartial!
how——
Olive.
I would be just, John; I would be!
John.
[Savagely taking a cigarette from the box on the table and sticking
it between his teeth.] Women of your temperament detect a leer in
the smile of a wax doll.
Olive.
I give you my word that I will make every allowance for you both,
if you will let me hear you together. You are old friends—“chums”
was her expression for it in the witness-box to-day—and you are
Jack and Theo to each other, naturally; I am prepared for all that
kind of thing. She can kiss you good-bye when she parts from you—
[beating her brow]—I can comprehend even that. Only—only let me
be satisfied by her general tone and bearing, by that unmistakable
ring in the voice, that she has never been the arrant little profligate I
once thought her.

[John now sitting staring at the carpet and chewing the


end of his cigarette.
John.
Supposing I—consented, and you were—satisfied——?
Olive.
[Rising and speaking earnestly and rapidly.] We are in June; I
would have her to stay with me. My friends, her own friends, should
see that we were close companions. She should go everywhere with
me; my arm should always be through hers. I would get a crowd
together; she should receive my guests with me. Oh, by Goodwood
week her reputation should be as sound as any woman’s in England!
Come! think of the dreadful days and nights she’s given me, whether
she’s good or bad! Come! wouldn’t that be generous?
John.
[In a low voice.] Look here! you would swear to me you’d never
use against her anything that might arise during our meeting—I
mean anything that your cursed jealousy could twist into harm?
Olive.
Solemnly. If she proclaimed herself openly in this room to be your
—[with a stamp of the foot he rises]—she should go scot-free, for
me. If she behaved as an innocent woman, she might walk over me
in the future, trample on me; I’d be a slave to her. Only satisfy me!
[He goes to the writing-table, and rapidly scribbles a
note. She watches him with eager eyes. When he has
finished writing, he takes an envelope, rises, comes
to Olive, and holds the note up before her.
John.
“Come to the cottage.—J. A.”

[She inclines her head. He touches the bell-press. Then


he encloses the note in the envelope, which he
fastens, and hands to Olive.
Olive.
Why?
John.
Take it. [She takes it wonderingly.] I have met your demands so
far. Now, if you wish to do a womanly thing, you’ll throw that on the
fire. [Quaife enters; Olive stands staring before her. Speaking in
measured tones, keeping his eyes on Olive.] Quaife, the note which
Mrs. Allingham will give you is for the messenger.
Quaife.
Yes, sir.
John.
If a lady arrives, ask her to sit down in the card-room; let me
know when she comes. I am alone, should the lady make any
inquiries.
Quaife.
Very good, sir.
John.
Olive, Quaife is waiting for the note. [There is a pause; then Olive
turns suddenly and hands Quaife the note. He goes out. There is
another pause.] And after this—after this!—you and I! Upon what
terms do you imagine you and I will be after this?
Olive.
Oh, if she comes out of it well, I will be so good to her——
John.
[Contemptuously.] Ah——!
Olive.
[Clutching his arm.] I will make you forgive me for it; I will make
you! [He releases himself from her, almost roughly, and moves away,
turning his back upon her.] Of course, you will not mention to Mrs.
Fraser that you and I are in any way—in any way——?
John.
Reconciled! [Sitting on the settee, laughing wildly.] Ha, ha, ha——!
[Turning to her.] Why not?
Olive.
Naturally, she wouldn’t open her lips to you at all if you did.
John.
[Waving her away.] Faugh!
Olive.
[Her hand to her brow.] You are—very—polite—[She walks slowly
and painfully towards the steps, pausing in her walk, and referring to
her watch.] John, when the talk between you and Mrs. Fraser has—
gone far enough, I will strike ten on the bell of the little clock in
here. You understand?
John.
When you are satisfied!
Olive.
[Beginning to ascend the steps, with the aid of the balustrade.]
When I am satisfied.
John.
Olive——! [She stops.] It’s not too late now for us to think better
of playing this infernally mean trick upon her.
Olive.
[Steadily, in a low hard voice.] Why, nothing can arise, during this
interview, injurious, in the mind of any fair person, to Mrs. Fraser’s
reputation?
John.
[Starting to his feet.] Nothing! nothing!
Olive.
Then I am clearly serving Mrs. Fraser’s interests by what I am
doing.

[She disappear into the library. After a brief pause, John


hastily goes to the dining-room door, and opens it
slightly.
John.
Mrs. Cloys! Mrs. Cloys!
Mrs. Cloys.
[From the dining-room.] Yes.
John.
Let me speak to you? [Mrs. Cloys enters; he closes the door
sharply, speaking hurriedly and excitedly.] I—I have altered my mind
about meeting Mrs. Fraser——
Mrs. Cloys.
Altered your mind——?
John.
I have sent a note to her by her messenger asking her to see me
here.
Mrs. Cloys.
Mr. Allingham, I protest against this as quite unnecessary.
John.
Pardon me. [Producing Theophila’s letter, and speaking disjointedly,
uneasily.] On—on consideration, it seems to me that—that—for
everybody’s sake, I have to satisfy my wife that Mrs. Fraser’s
presence is due solely to the most innocent causes.
Mrs. Cloys.
Mrs. Allingham has, I take it, arrived at certain conclusions as to
the motive of my visit?
John.
She has.
Mrs. Cloys.
And now, Theophila following upon our heels——?
John.
It is a most unfortunate accident——
Mrs. Cloys.
[Eyeing him penetratingly.] Mr. Allingham, you have no doubt
whatever of the absolute genuineness of my niece’s excuse for
calling upon you?
John.
Oh, Mrs. Cloys——!
Mrs. Cloys.
[Sitting.] Yes, I admit that I came here to-night to ask you to
pledge your word to us that Theo should run no further risk from her
—her acquaintanceship with you; to entreat you, if she should be so
base, so abandoned——
John.
You mean you thought it possible, probable, that this lady had run
away from her husband and friends with the deliberate intention of
joining me—me! [Mrs. Cloys covers her eyes with her handkerchief.]
Great Heaven, I suppose there is no living soul who will believe in an
honest friendship between a young man and a young woman!
Mrs. Cloys.
There are certain rules for the conduct of friendship, Mr. Allingham
——
John.
[Excitedly.] Rules! The world is getting choked with rules for the
conduct of everything and every body! What’s the matter with the
world that a woman has to lose her character and paint her face
before she is entitled to tell a man her troubles, and hear his in
return, across a dying fire, by lamplight, when the streets are still
and a few words of sympathy and encouragement stir one like a
sudden peal of bells——?

[He stands by the fire, bowing his head upon the


mantelpiece.
Mrs. Cloys.
[Looking at him, and speaking in a low voice.] Ah! a dying fire, the
lamplight, the still streets——! The world is what it is, Mr. Allingham.
John.
Yes, and it’s a damnable world!
Quaife enters.
Quaife.
The lady has arrived, sir.
Mrs. Cloys rises.
John.
[To Quaife.] When I ring, show her in here.
Quaife withdraws.
Mrs. Cloys.
[Agitatedly.] Mr. Allingham, you will not let Theo slip through my
fingers; you won’t let her escape me——? [Looking at him.] Oh, I
will trust you so far.
John.
You may. I only ask you to allow me to have my interview with
Mrs. Fraser undisturbed.
Mrs. Cloys.
Ah, if you knew how I hate the idea of this meeting between you
two! [Turning sharply.] I’ve a feeling that something evil is going to
result from it——!
John.
I can only repeat, you’re wrong in what you think of me—[turning
away]—wrong, every one of you.
Mrs. Cloys.
[Coming to him, her manner gradually changing to harshness,
almost to violence.] Well, understand me, Mr. Allingham! I’m inclined
to—to half-believe in you; you’ve an honest face and air—not that
those things count for much; but understand me: if you bring, in any
shape or form, further harm to her——!
John.
[Indignantly.] What further harm can I bring to her? You find me
here with my wife——!
Mrs. Cloys.
Sir, you had a wife round the corner when you were engaged in
destroying my niece’s reputation in Lennox Gardens! [Recovering her
composure.] But enough of that. [Calmly, amiably.] We do
understand one another, do we not?
John.
[Shortly.] Oh, perfectly.
Mrs. Cloys.
That’s right. [Arranging her bonnet-strings, which have become
slightly disordered.] Excuse me for breaking out in this fashion. [She
goes to the door, he following her. At the door she turns to him with
grave dignity.] I’m afraid I’ve impressed you as being rather a
tigress.

[She goes out. He closes the door after her and stands
staring at the ground for a moment; then he gently
turns the key in the lock and carefully draws the
portière across the door. He is about to put his finger
upon the bell-press when he pauses.
John.
[In a low voice.] Olive. Olive. I have not yet rung the bell. Do you
stop me? [A pause.] Won’t you stop me?

[He waits; there is no answer; with an angry gesture he


rings the bell. After a brief pause Quaife enters;
Theophila follows. She is dressed as in the previous
Act, but is now thickly veiled. Quaife gives a puzzled
look round the room and withdraws.
Theophila.
[Advancing and speaking in a weak, plaintive voice.] Oh, Jack——!
[They shake hands, but in a constrained, rather formal way.] Of
course, we could have had our talk very well in the lane; but it’s kind
and considerate of you to ask me in.
John.
Oh, not in the least. [Confusedly.] I—er—I—Do sit down.

[She looks at him, expecting him to find her a chair. In


the end, after a little uncertainty, she seats herself on
the right of the table. In the meantime he ascertains
that the door by which Theophila has entered is
closed.
Theophila.
[Lifting her veil.] I’m afraid you’re a little angry with me for
hunting you up.
John.
Angry? Why should I be angry?
Theophila.
Well, I suppose it is another—what d’ye call it?—injudicious act on
my part. But it seemed to me, if I thought about it at all, that we
came so badly out of it to-day, that nothing matters much now. At
any rate, my character’s gone.
John.
[Advancing a step or two, but avoiding her eye.]
No, no——
Theophila.
Oh, isn’t it? And yours has gone too, Jack; only a man gets on
comfortably without one. [Facing him, her elbows on the table.]
Well, what do you think of my news?
John.
[Looking at her, startled.] By Jove, how dreadfully white you are!
Theophila.
[With a nod and a smile.] The looks have gone with the character
—[putting her hands over her face]—both departed finally.
John.
[Coming a little nearer to her.] Er—when you’ve had a little rest
you will see everything in a brighter light——
Theophila.
I should have kept my appearance a good many years, being fair
and small. [Removing her hands—looking up at him.] You used to
tell me I should last pretty till I’m forty-five. Do you remember? [His
jaw drops a little, and he stares at her without replying.] Do you
remember?
John.
[Moving away.] Oh—er—yes——
[Theophila.
Is there anything wrong with you, Jack?
John.
Wrong—with me? No.

[She shifts to the other side of the table, to be nearer to


him. He eyes her askance.
Theophila.
Why don’t you tell me what you think of my news?
John.
Your news?
Theophila.
[Impatiently.] You’ve read my letter, Jack. I’m a—what am I?—a
single woman again; a sort of widow.
John.
You are acting too hastily; you’re simply carried away by a rush of
indignation. Perhaps matters can be arranged, patched up. You
mustn’t be allowed to——
Theophila.
Arranged! patched up! You don’t realise what you’re proposing!
You wouldn’t make such a suggestion if you had been a fly on the
wall this afternoon while Mr. Fraser and I were—having a little talk.
[Struggling to keep back her tears.] Alec—my husband—he was very
much in love with me at one time! I never doubted that he would
stand by me through thick and thin. He has done so pretty well, up
till to-day, up till the trial, and then, suddenly, he—he——

[She produces her handkerchief, rises, then moves away


abruptly, and stands, with her back to John, crying.
John.
[Turning to the fire.] Mr. Fraser was taken aback, flabbergasted, I
expect, by the tone adopted by the judge to-day; there’s that poor
excuse for him. But a little reflection will soon——
Theophila.
[Drying her eyes.] Oh, don’t prose, Jack! [Turning.] On the whole,
I think it’s better that he and I have at last managed to find out
where we are.
John.
[Turning to her.] Where you are?
Theophila.
You know, there’s always a moment in the lives of a man and
woman who are tied to each other when the man has a chance of
making the woman really, really, his own property. It’s only a
moment; if he let’s the chance slip, it’s gone—it never comes back. I
fancy my husband had his chance to-day. If he had just put his hand
on my shoulder this afternoon and said, “You fool, you don’t deserve
it, for your stupidity, but I’ll try to save you——”; if he had said
something, anything, of that kind to me, I think I could have gone
down on my knees to him and——[Coming to John excitedly.] But he
stared at the carpet, and held on to his head, and moaned out that
he must have time, time! Time! Oh, he was my one bit of rock!
[Throwing herself into a chair on the right.] If he’d only mercifully
stuck to me for a few months—three months—two—for a month
——!
John.
[Going to her slowly and deliberately, and standing by her.] Mrs.
Fraser. [She looks up at him surprised.] Of course, whatever future is
in store for you, nothing—no luck, no happy times—can ever pay
you back for the distress of mind you’ve gone through.
Theophila.
Nothing, Jack—Mr. Allingham. [Her hand to her brow.] Oh, nobody
knows! Oh, Jack, some nights—some nights—I’ve said my prayers.
John.
I’ve found myself doing that too—in hansoms, or walking along
the street.
Theophila.
Praying for me?
John.
[Nervously.] Y-yes.
Theophila.
Oh, don’t make me cry again! Oh, my head! oh, don’t let me cry
any more——!
John.
Hush, hush, hush! What I want to say is this. You knew young
Goodhew?
Theophila.
Charley Goodhew—the boy that cheated at baccarat?
John.
He didn’t; he was innocent.
Theophila.
I’m sure he was, poor fellow.
John.
Well, he told me, one day in Brussels, that he managed to take all
the sting out of his punishment by continually reminding himself that
it was undeserved, that there wasn’t a shadow of justification for it.
I suppose it would be the same with a woman who—who gets into a
scrape; an innocent woman?
Theophila.
It’s good, under such circumstances, if you can feel a bit of a
martyr, you mean?
John.
That’s it. So, in the future, you must never tire of reminding
yourself of the utter harmlessness of those hours we used to spend
together in Lennox Gardens.
Theophila.
They were harmless enough, God knows.
John.
[Earnestly, eagerly.] God knows.
Theophila.
And they were awfully jolly, too.
John.
[Blankly, his voice dropping.] Jolly——?
Theophila.
You know—cosy, comforting.
John.
Yes, yes—comforting. It was the one thing that kept me together
during those shocking Pont Street days of mine.
Theophila.
Our friendship?
John.
Our friendship. When I was in the deepest misery, the thought
would come to me: “Well, I shall see my little friend to-day or to-
morrow.” And then I’d go through our meeting as I supposed it
would be—as it always was——
Theophila.
“’Ullo, Jack! good morning—or good evening. Oh, my dear boy,
you’re in trouble again, I’m afraid!”
John.
“Dreadfully. I shall go mad, I believe—or drink.”
Theophila.
“Mad—drink; not you. Sit down and tell me all about it.”
John.
And so on.
Theophila.
And so on. I had my miseries too.
John.
Yes, you had your miseries too.
Theophila.
And then you invariably came out with that one piece of oracular
advice of yours.
John.
Ah, yes. “Don’t fret; it’ll be all the same a hundred years hence.”
Theophila.
Which you couldn’t act upon, yourself. How vexed it used to make
me—and the ponderous way you said it!
John.
Well, it was a good, helpful friendship to me.
Theophila.
And to me.
John.
[Standing a little behind her; speaking calmly, but watching her
eagerly.] Because, all the while, there was never one single thought
of anything but friendship on either side.
Theophila.
Why, of course not, Jack.
John.
You’d have detected it in me, if there had been?
Theophila.
Trust a woman for that.
John.
And if you had for a moment fancied that I was losing sight of
mere friendship——?
Theophila.
You!
John.
What would you have done?
Theophila.
Oh, one day, the usual headache; not at home the next—the
proper thing. But, Jack dear, I never felt the slightest fear of you—
and that’s what makes an end like this so cruel, so intolerably cruel.
John.
Never felt the slightest fear of me——?
Theophila.
No, never; oh, of course, a woman can tell. Somehow, I knew—I
knew you couldn’t be a black-guard.
John.
[About to seize her hand, but restraining himself.] God bless you!
God bless you! [He walks away and pokes the fire vigorously, hitting
the coal triumphantly.] Ah, ha, ha! [Turning to Theophila.] I beg your
pardon; you’re in the most uncomfortable chair in the room.
[She rises and crosses the room.
John.
[Arranging the pillows on the settee.] You must be so weary, too.
I’m confoundedly stupid and forgetful to-night.
Theophila.
[Sitting on the settee.] Fancy! a fire in June!
John.
[Walking about elatedly, dividing his glances between Theophila
and the library.] I love to see a fire.
Theophila.
[Suddenly.] Of course. [Dropping her voice.] I remember. [He
stops, staring at her.] Do you recollect? [Steadily gazing into the
fire.] That night when we were sitting over the fire in that little room
in Lennox Gardens——
John.
[Hastily.] Oh, yes, yes——
Theophila.
“I shall always burn a fire, Theo,” you said, “to bring back these
nights, these soothing, precious talks in the quiet hours. Wherever I
may be, I shall only have to light my fire to hear you and to see you
—to see you sitting facing me——”
John.
Ah, that evening—yes, I was terribly—terribly down that evening
[Wiping his brow.] By-the-bye, we—we mustn’t neglect the—the—
the matter of business—the little matter of business——
Theophila.
[Rousing herself.] Matter of——?
John.
The matter of business you mention in your letter——
Theophila.
[Rising.] Oh, yes. [Sitting on the left of the centre table.] Jack, I—
I do hope you won’t hate me for asking you. You see, if I went to
any one else, I should run a chance of having all my arrangements
upset. I—I want to borrow a little money——
John.
Ah, yes, certainly—anything—I shall be most happy——
Theophila.
This is exactly how I am placed. Mr. Fraser wanted to hurry me off
abroad—ah! that’s done with. Instead of that, you see, I’ve taken my
travels and my future into my own hands. I’ve telegraphed to Emily
Graveney, who was at Madame MacDonnell’s with us girls in the Rue
D’Audiffret-Pasquier. Emily is teaching in Paris now—I hardly know
how she scrapes along; she’ll be mad with delight to have my
companionship. But till the lawyers settle my position precisely as

You might also like