0% found this document useful (0 votes)
20 views

Learning Methods

artificila

Uploaded by

Harry Scoldfield
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
20 views

Learning Methods

artificila

Uploaded by

Harry Scoldfield
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 70
‘Jounal of Astifsial Intelligence Research 78 2028) 287- 56 Subunit 03/2023; published 10/2023 A Comprehensive Survey on Deep Graph Representation Learning Methods Tjeoma Amuche Chikwendu DEOMAAMUCHE@STDUESTCEDUCN Xiaoling Zhang 32673501 @QQCOM Isaac Osei Agyemang TOAGYEMANGSTD.LESTC EDUCN Isaac Adjei-Mensah IADIEIMENSAH@STD UESTCEDUCN School of fermation and Communication Engineering University of Electronic Science and Tecimology of China 611731 Chengdu, China Ukwuoma Chiagoziem Chima UxWUOMAgSTD LESTCEDU CN Chukwuebuka Joseph Ejiyi GREATIENVIQSTD UESTC EDUCN School of Information and Software Engineering University of Electronic Science emd Technology of China 610084 Chengdu, China Abstract ‘There has been a lo of activity in graph representation learning in recent years. Graph represen tation learning aims to produce graph representation vectors to represent the structure and charac- teristics of huge graphs precisely. This is crucial since the effectiveness of the graph representation ‘vectors will influence how well they perform in subsequent tasks like anomaly detection, connec: tion prediction, and node classification, Recently, there has been an increase in the use of other ddeep-learing breakthroughs for data-based graph problems. Graph-based learning environments hhave a taxonomy of approaches, and this study reviews all their learning settings. The learning problem is theoretically and empirically explored. This study briefly introduces and summarizes the Graph Neural Architecture Search (G-NAS), outlines several Graph Neural Networks” draw backs, and suggests some strategies to mitigate these challenges. Lastly, the study discusses several potential future study avenues yet to be explored. 1. Introduction Envision a hypothetical realm in which interconnectivity flourishes and elaborate patterns emerge from complicated relationships that span extensive networks. In the contemporary era of digital advancements, where knowledge acquisition relies heavily on data, graphs are pivotal as underap- preciated protagonists. They encapsulate the fundamental nature of interrelationships that mold our global landscape. Have you ever contemplated how we can unravel these complex networks, deci- pher their concealed dynamics, and utilize their potential for profound insights? Within the dynamic and ever-changing realm of data science, a particular paradigm emerges as a noteworthy and valt- able asset: deep graph representation learning. Notably, graphs provide the potential to elucidate the complexities inherent in interrelated data, including many domains such as social networks and biological networks. Deep graph representation learning could improve our understanding of com- plex relationships, find hidden patterns, and increase machine learning. This study carefully ex- plores this intriguing area. (©2023 The Authors. Published by AI Access Foundation under Creative Commons Attribution License CC BY 4.0 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI The discipline of graph representation learning has become a significant area of study within the broader science of machine learning. Is primary objective is to develop efficient methods for pro- cessing and evaluating data that is represented in the form of graphs. Graphs are very effective structures that are utilized to represent and comprehend complex associations between different entities. As a result of their inherent capabilities, graphs are particularly suitable for a wide range of practical applications in the real world, including but not limited to social networks, recommen dation systems, bioinformatics, and numerous others, The initial advancements in graph representation learning were first observed in the field of graph kemels, as documented in historical records. The origins of graph kernel approaches can be traced back to the influential Weisfeiler-Lehman (WL) isomorphic testing (Weisfeiler & Leman, 1968), a fundamental notion that emerged earlier. This methodology, a fundamental pillar in the field, es- tablished the foundation for graph kernels - kernel functions carefully crafted to measure the simi- larity of graphs and their components. Within the domain of current scholarly investigations, the notion of graph kernels reverberates prominently in educational endeavors Nikolentzos et all (2021), serving as a testimonial to the ongoing significance of this transformative framework. The essential principle underlying graph kemels is the decomposition of complex graphs into distinct substructures. These substructures are then used to generate vector embeddings, which are carefully designed based on the features of these substructures. The origins of graph representation Jeaming can be traced back to its commencement within the domain of matrix factorization tech- niques. The initial stage of exploration was heavily influenced by traditional methods of dimen- sionality reduction, reflecting the influential work of Belkin & Niyogi (2001) and their significant contributions to the area. Several matrix factorization-based models have already been developed to effectively handle large-scale graphs with millions of interconnected nodes (Allab et al., 2016; Gong et al., 2014), Matrix factorization methods play a crucial role in this endeavor due to their intrinsic capability to reduce intricate proximity matrices into products of simpler matrices. The objective is to understand embeddings comprehensively, focusing on their ability to capture and represent the inherent proximity patterns effectively. Between the years 2014 and 2016, there was a notable development in the field, as two significant shallow models, namely DeepWalk (Perozzi et al, 2014) and Node2Vee (Grover & Leskovec, 2016), emerged in the environment. These meth- odologies utilized shallow neural networks to generate node embeddings. One notable characteris- tic of these models was their innovative utilization of the skip-gram framework, which was origi- nally grounded in the field of natural language processing. The primary principle that governed these approaches was to enhance the information contained in node embeddings by effectively ‘maximizing the likelihood of neighbouring nodes. The deployment of Stochastic Gradient Descent (SGD) across neural network layers can successfllly mitigate computational subtleties, so elegantly hhamessing and fine-tuning the strategic basis of this approach, This event was a significant turning point, driving the advancement of several models that were ready for further improvement, A mul- titude of breakthroughs have arisen, encompassing improved sampling procedures and iterative training processes, jointly influencing the direction of progress in this dynamic subject, Research into graph representation learning has recently gained traction since graphs conveniently utilize and represent most real-world data. Multimedia domain-specific data includes but is not limited to, social systems (Tan et al., 2019), linguistic (word co-occurrence) networks (Agrawal et al, 2021), biological structures (G. Zhou & Xia, 2018), and sundry. Graph models efficiently store and re- trieve relational knowledge of interacting entities (Besta et al, 2019). Graph data analysis can help with community discovery, behavior analysis, node classification, link prediction, and clustering 288 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS (Daud et al, 2020; Goyal & Ferrara, 2018; Inuwa-Dutse et al., 2021; J. Li et al., 2019; Zitouni et al., 2019), Graph embedding methods often transform unprocessed graph data into a high-dimen- sional vector while retaining essential graph characteristics. This method is called graph represen tation learning. Researchers used conventional machine learning methods based on the derived ‘features and the original data format. Pixel and word occurrence statistics are retrieved from photos and text, respectively. In the middle of the current surge of innovation, the objective of our research is to undertake an exploration of the fundamental aspects of graph representation leaming. The central inquiry of this study is: How can we effectively navigate the intricate nature of graph data, enable machines to understand connections, and advance the area of deep graph representation learning to unexplored frontiers? ‘Deep Learning (DL) systems have become popular over the past decades because they can solve Jeaming problems, learn representations from raw data, and make predictions based on the taught representation, In recent times, the field of artificial intelligence has undergone a significant trans- formation due to the emergence of deep learning techniques. These techniques have demonstrated exceptional achievements in various domains, such as image recognition, natural language pro- cessing, and speech recognition. In a similar vein, the fusion of deep leaming techniques with graph-based data has resulted in the emergence of deep graph representation leaming approaches. The objective of these methodologies is to utilize the computational powers of deep neural net- ‘works to acquire meaningful representations from graph data, hence facilitating improved decision- ‘making and prediction skills. The present level of research in deep graph representation learning is characterized by its dynamic and swiftly progressing nature, Scholars and professionals are con- sistently investigating new designs, methods of optimization, and algoritmic advancements to tackle the distinct obstacles presented by data that is structured as a graph. The utilization of Graph Neural Networks (GNNs), Graph Convolutional Networks (GCNs), and Graph Attention Networks (GATS) has demonstrated encouraging outcomes in several applications, with significant advance- ‘ments achieved in domains including node categorization, link prediction, and graph construction, Due to graphs’ inregular character, which may contain a changeable number of unordered nodes and a changing number of neighbours, several key operations, like coavolutions, are simple to per- form in the image domain but difficult in the graph sector. In addition, modern machine learning techniques presume, among other factors, that instances stand alone. References, relationships, and engagements relate to graph data; hence this premise no longer applies. Despite advances in com- puter vision, natural language processing, biological imaging, and bioinformatics, DL still lacks relational and scientific thinking, intellectual abstraction, and other cognitive capacities. Graph Neural Networks (GNNs) structure computations and representations in Deep Neural Networks (DNNS) as graphs to address these issues. GNNs are graph-domain deep leaming algorithms. Graphs are difficult to visualize; thus, using deep learning algorithms to evaluate graph data has gamered attention in recent years. Graphs Having Irregular Structures: Unlike pictures, music, and text, which have a grid struc- ture, graphs have uneven topologies, making some basic mathematical operations harder (Shuman et al. 2013). Graph data makes convolution and pooling difficult, Heterogeneity and Variety: Graphs with numerous shapes and properties can be complex. Heter- geneous, homogeneous, weighted, and signed graphs are possible, Graph-related activities include node classification, link prediction, graph classification, and graph synthesis. Different model structures are needed to address different types, qualities, and tasks. 289 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI Interdisciplinarity: Biology, chemistry, and the social sciences often use graphs, Domain knowledge is essential to solving problems but can also slow model design. Gradient-based training approaches are difficult for molecular graphs due to non-differentiable objective functions and chemical restrictions. Embedding Dimension and Graph Features: Finding the optimal embedding dimension of rep- resentation (Gou et al., 2020) is complex and involves additional challenges (Shen et al., 2020), Higher-dimensional illustrations retain more graph fearures as muuch as they need more storage and processing time. Lower-dimensional depictions require more resources, and it could reduce graph noise too, However, the original graph may lose important data, Input graph and application domain affect dimension selection (H. Chen et al., 2018). Ifa graph has several properties, embedding one may be difficult. Node features, connection designs, meta-data, and more can show graph charac- teristics. The application determines the most helpful information, Kernel functions (Nikolentzos et al., 2021), summary graph measures (such as degrees or grouping coefficients) (Daud et al., 2020), and carefully chosen features to quantify local neighbourhood structures (J. Li et al., 2019) are frequently used by traditional machine learning algorithms to extract structural information from graphs. However, these systems cannot change during the leaming process because of the rigidity of the and-engineered characteristics. Moreover, it may be costly and time-consuming to implement these functionalities. Gaining a comprehensive comprehension of the importance of deep graph representation learn ing is vital within the contemporary context of a data-centric society. As the complexity of our interactions with interconnected data increases, it becomes increasingly important to possess the capability to identify concealed patterns, uncover latent linkages, and execute accurate predictions, ‘Therefore, this extensive examination of deep graph representation learning is a current and essen- tial source, offering a detailed investigation of state-of-the-art techniques, computational enhance- ‘ments, and probable directions for future scholarly inquiry. By furthering our understanding in this field, we create opportunities to revolutionize our approach to interpreting, analyzing, and deriving insights from intricate data structures, thus paving the way for enhanced, data-centric decision- making processes. 1.2 Scope. This study covers methods for representing nodes, edges, and subgraphs, which provide context, intelligence, and semantics to graphs for applications, and evaluates graph representation Jeaming research, We unify several diverse lines of research that have attracted significant attention in recent years across various domains and venues while also focusing on cutting-edge techniques that are scalable to enormous graphs and inspired by deep learning. GNNs perform well on graph- structured datasets in supervised, semi-supervised, self-supervised, and unsupervised learning con- texts, Auto-encoders, contrastive learning, and random walk ideas underpin most graph-based un- supervised learning approaches. The studies primary aim is to conduct a comprehensive investiga- tion of the advanced techniques in the field of deep graph representation learning and to conduct an in-depth assessment of its advanced methodologies, carefully examining their respective ad- vantages and constraints. Graph Neural Architecture Search (G-NAS) is introduced in this study, it incorporates and classifies G-NAS components. This classification, based on graph neural net- works (GNNs)" intrinsic problems in architectural design, fills a significant gap in the literature. It helps design GNN architectures with improved efficiency and efficacy by explaining the key com- ponents of (G-NAS) and their implications, Furthermore, this study aims to explore the complex A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS restrictions inherent in GNN techniques, focusing on the isstes associated with achieving interpret- ability and scalability in graph-based models, amongst others. With the increasing number of graph representation learning models in recent years, various ap- proaches have been utilized to find relevant research in this domain. By adopting a strategic ap- proach, a search methodology was developed formulating specific keywords and carefully evalu- ating reliable sources. The compilation of keywords includes concepts such as graph embedding, graph representation learning, graph neural networks, graph convolution, and graph attention, The search for pertinent research involved prominent and acclaimed conferences and journals, such as AAAI ICAL, SIGKDD, ICML, WSDM, Nature Machine Intelligence, and Pattern Recognition, as well as trustworthy internet sources. Overview of the Survey. The subsequent section of this study is structured as follows: In the second section, a concise summary of relevant literature is presented, encompassing various sur- veys and overviews within the area, Section 3 introduces the concept of Graph Representational Learning (GR) and examines several graph tasks based on data with a graph structure. In Section 4, the categorization of GNN-based techniques and learning situations is presented logically to fa- cilitate complete comprehension, Section $ provides an in-depth analysis of the sequential structure of Graph Neural Networks (GNNS), elucidating their internal mechanisms. In the subsequent sec- tion, Section 6, the discourse shifts toward contemporary applications of Graph Neural Networks (GNNS), thereby highlighting their extensive and diverse practical utility. In the following section, an analysis is conducted on the inherent limitations of Graph Neural Networks (GNNS), and various sttategies are proposed to overcome these constraints. Section 8 provides a concise overview of Graph Neural Architecture Search (G-NAS), emphasizing its notable importance within the do- ‘main, Section 9 elucidates unresolved aspects of grapit solutions based on GNNs, opening the path for future research, The research finishes in the concluding part, presenting a comprehensive syn- thesis of the knowledge acquired throughout the article. Contributions. The main contributions of this study are summarized as follows; 1. Firstly, a thorough analysis of GNNs is provided. In contrast to the other studies focusing ‘on only one type of learning environment, this study considers all of them. 2. This study introduces and categorizes G-NAS constituents based on the building challenges, this is not provided in previous surveys. 3. This study outlines GNN-based method limits and workarounds. Limitations include over~ smoothing, scalability, expressiveness, over-squashing, and destructive loss, to mention but afew. 2. Related Work: Surveys in Graph Representational Learning ‘The present literature inventory on Graph Neural Networks (GNNs) primarily consists of survey studies that either cover a wide range of topics or go into a specific learning environment (Ahmad et al, 2020; Chami et a., 2022; C, Chen et al., 2022; F. Chen et a., 2020; J. Zhou et al., 2020; Y. Zhow et al., 2022). In their paper, Abadal et al. (2021) conducted a comprehensive examination of Graph Neural Networks (GNNs), focusing on their computational aspects. Furthermore, the re- search encompassed a thorouigh examination of the several software and hardware acceleration techniques already employed. The authors of this study were provided with a communication-fo- cused, hardware-software hybrid that represents an appropriate solution for GNN accelerators. In their study, Zhou et al. (2020) provided a thorough design process for Graph Neural Networks 291 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI (GNNs) and discussed several GNN variations employed in each module. The authors comprehen- sively analyzed the theoretical and empirical aspects of Graph Neural Networks (GNNs) in the study. The initial phase of the paper’s discourse involved categorizing GNN applications into two distinct groups: structural and non-structural scenaios Furthermore, the article elucidated four outstanding matters conceming GNNs and deliberated on probable prospects. Conversely, it lacked a clearly defined taxonomy for individual learning scenarios. The study conducted by Z. Wu et al, (2020) introduced a novel categorization framework for Graph Neural Networks (GNNs). This framework categorizes GNNs into many subtypes: re- current, convolutional, spatial-temporal, and graph autoencoder architectures. Nevertheless, the study could have thoroughly examined every leaming environment. The majority of the current survey studies in the field of Graph Neural Networks (GNNs) primarily concentrate on either the individual learning scenario or on the broader scope of GNNs, as shown in Table 1. To bridge this knowledge gap and expand upon the existing body of literature, the current research undertakes a comparative examination of different graph-based deep learning architectures. Significantly, we explore G-NAS captivating domain, This innovative addition has been carefully designed to ad- dress the distinct construction constraints associated with GNNs, Through a methodical approach in addressing these issues, the G-NAS presents a novel viewpoint and framework to enhance the domain of GNNS, thereby paving the way for groundbreaking progress. Papers Difference between this survey and existing ones ‘Abadal etal (2021) They thoroughly examined Graph Neural Networks (GNNs) fiom a perspective well- rounded ia the field of computing, It caused out the vasious methodologies used for sofare nd hacdvvare acceleration. This investigation led othe development ofa nove vision empha sizing NN accelerators’ significance. These accelerators are distinguished by their graph- awareness, hardwae-softwave integration, and communication-centie feature. In conta, our study deviates from this trajectory by pursuing a unique analysis path. Our attention is directed toward the many learing environments present inthe domain of Graph Neural Net works (GNNs). We profciently establish a delineated classification system for vatious leam- ing environments by employing a systematic and thorough methodology. This endenvor se suits in 9 unified fiamework that enhances the comprehension of Graph Neural Networks (GANS) within these heterogeneous settings (Z. zhang et al, Predominantly delved into classical and representative Graph Neural Network (GIN) arch 120) tectures, hence, by passing the exploration of deep graph representation learning from the vantage point of the latest advanced paradigms like graph self-supervised learning, our ee search carves a distinctive path, as it stands as a beacon of comprehensive scrtiay. Signif cantly, the sty explores the complex domain of deep graph representation learing, reveal ing concealed insights and innovative approaches. In order to enhance the level of inelleetal discussion, this study presents the innovative notion of Graph Neural Architectural Search (oNAS). (Z.Wuetal, 2020) They introduced uew categorization approach that divided well-known GNNS into four groups: recurrent, conventional, spatial-temporal, and graph autoencoders. This theoretical paradigm fails to explain in depth the leaming settings. Contraily, our atele seeks to enti, scademie discourse, by introducing taxonomies tllored to GN learning contexts. This st- tegle method helps our inquiry by disclosing the deep complexity and suble deals of each GN leaming situation, 292 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS This study presents a comprehensive overview of the design pipeline of GNNs and provides (Zhou etal on 8 detailed analysis of several module vatiants employed in GNNs. The article eonducted a comprehensive analysis of GNNs ftom both a theoreteal and empliieal standpoint. The ee search delineated the applications of GNNs by categorizing them into two distinct seenaos: structual and non-structural, The report additionally presented four unresolved issues peta ing to GNNs and provided prospects for future reseaich in ths area. However, i i worth noting that the paper has aot preseated a distinct taxonomy for each of the many leaning slmmatons and this sidy takes advantage to research about varius leamings of GNNS. (GShoshraftar& An, Classifies works in graph representation learning, aceuately differentiating between static and 2022) dynamic graphs. However, although these taxonomies effectively highlight the core principles ‘of Graph Newal Networks (GN) thelr coverage of leaming paradigms needs to be ian proved, Our researc isa compreheasive study examining vavious leaning settings inkevent to Graph Neural Networks (GNNS). (¥.Zhomet al, _ Presents a comprehensive overview of Graph Neural Network (GNN) design, briefly discuss 2022) ing their applications, Concurrently, our study follows a comparable path, ivestigating GNN ‘designs and pushing the limits of exploration and presenting a novel aspect called Graph New sa Architectural Search (G-NAS) during our thorough investigation. The innovative addition discussed inthis wodk serves asa valuable resource for scholar, providing guidance and in sight into the complex domain of Grap Neural Networks (GNNS).Itsheds light on the inher. ‘ent obstacles in constricting GNNs, enhancing our understanding of this field. The G-NAS framework provides a comprebeasive experience ofthe fundamental components of GNNs, ‘offering researchers essential insights ino this rapidly evolving and impactful down. Table |: Difference berween his survey and existing ones. 3. Graph Representation Learning (GRL) ‘The Graph Representation Learning (GRL) techniques seek to develop vector representations for various graph elements to capture the structure and semantics of a graph-structured or networked rich dataset to achieve a good representation. Leaning to represent graphs uses various methodol- gies derived from graph theory, manifold learning, topological data analysis, neural networks, and generative graph models, These methodologies all have their origins in conventional network re- search, When applying machine learning to networks, the most challenging aspect is undoubtedly extracting information about interactions between nodes and combining it into a machine-learning ‘model. Traditional machine learning methods utilize either summary statistics (such as degrees or clustering coefficients) or specifically built features to quantify local neighbourhood structures to extract relevant information from networks. Examples of these statistics and features include: (e.g., network motifs). Representation learning systems can automatically learn to encode network struc- ture into low-dimensional representations by replacing g existing methods with deep learning and non-linear dimensionality reduction. The adaptability of leamed embeddings enables them to be helpful in various modelling problems. In graph representational learning, there exists a collection of models that may be categorized into separate groups: Graph Kemnels, Matrix Factorization, Shal- low Models, and Deep Graph Models. These models demonstrate an elegant alignment within their respective categories. Each category represents distinct strategies that contribute to the overall graph representation, providing diverse approaches to analyze and comprehend the complicated relationships embedded inside large data structures. 293 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI Graph keels and matrix factorization-based models are foundational concepts in graph repre- sentation learning. Graph kernels are widely recognized for their utilization in graph embeddings. ‘They utilize a definite mapping function, which allows for exploring the complexities associated with graph classification problems (Shervashidze et al., 2009; Togninalli etal., 2019), This domain has two separate classifications: graph kemels, which aim to reveal the subtle features of graph similarity, and node-based kemels on graphs, which are carefully crafted to uncover the everyday relationships between individual nodes in graph structures. Graph kernels examine graphs or their complex substructures, including nodes, subgraphs, and edges, to evaluate their similarity. At the core of this endeavor resides the fundamental task of assessing the resemblance between graphs in an unsupervised fashion, Numerous ways emerge to quantify the degree of similarity between pairs of graphs. The tactics employed comprise various techniques, including graphlet kemels, WL ker- nels, random walks, and shortest paths (Shervashidze et al., 2009). Graphlet kernels stand out as a straightforward yet powerful approach within the vast array of kernel approaches, Graphlet kernels are a method that operates by quantifying subgraphis of limited size. This approach allows for ex- ploring graph similarities, revealing concealed patterns contributing to a deeper comprehension of the subject(Kondor et al., 2009). In conclusion, graph kemnels serve as effective models, offering a range of advantages that highlight their importance: + Graph kernels are widely recognized as valuable tools for quantifying the similarity between ‘graph items by implementing various methodologies for graph kernel discovery. The state- ‘ment posits that the concept above can be perceived as an overarching representation of conventional statistical approaches (Kriege et al., 2020) ‘© Numerous kernel techniques have been suggested in the literature to mitigate the computa- tional burden associated with graphi-based kernel methods (Urry & Sollich, 2013). The uti- lization of kemel tricks has the potential to decrease the spatial dimensions and computing complexity associated with substructures, all while maintaining the effectiveness of kernels. Despite the several advantages of kernel approaches, their scalability could be improved by certain limitations, © The majority kemel models exhibit a limitation in their ability to learn node embedding for newly introduced nodes. In practical applications, graphs possess dynamic characteristics, allowing their constituent elements to undergo evolutionary changes. Hence, re-leaning charts are necessary for graph kernels whenever a new node is introduced, resulting in a time-consuming and challenging application in practical scenarios. ‘© Most graph kernel models do not consider the presence of weighted edges, resulting in the potential loss of structural information. This could decrease the likelihood of graph rep- resentation within the latent space. ‘© The computational complexity of graph kernels is classified as NP-hard (Borgwardt & Kriegel, 2005). While various kemel-based models have been developed to decrease com- putational time by incorporating substructure distribution, this approach may inadvertently introduce greater complexity and hinder the model’s capacity to represent the overall struc- ture accurately. “Matrix factorization-based models aim to capture the fundamental characteristics of a graph by tepresenting it as matrices. These models extract embeddings by decomposing the matrices through complex procedure (Ou et al., 2016: Z. Zhang et al., 018). The present scene is embellished with @ diverse range of strategic options that dictate the course of factorization modeling. The A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS fundamental objective of these models is to effectively estimate nodes’ complex interconnected- ness by utilizing high-order proximity. Matrix factorization is a technique that aims to reduce the size of high-dimensional matrices, such as the adjacency matrix or the Laplacian matrix, represent- ing the graph structure, By transforming these matrices into a lower-dimensional space, matrix factorization simplifies the representation of complex graph relationships, making it more concise. Numerous decomposition techniques, such as Singular Value Decomposition (SVD) and Principal Component Analysis (PCA), are extensively utilized in graph representation learning and recom- ‘mendation systems. Various models have been developed to decrease computational complexity in ‘matrix factorization by optimizing sampling procedures (Lian et al., 2022; R. Yang et al., 2019). ‘The primary concept of the NRL-MF model (Lian et al, 2022) revolved around developing a hash- ing function specifically designed for the computation of dot products. The hashing function effi- ciently computes a binarized vector representation for each node using Exclusive (XOR) operators. ‘The proposed model can acquire binary and quantized codes by utilizing matrix factorization tech- niques while maintaining a high level of preservation for higher-order closeness. Matrix factoriza- tion-based models offer numerous advantages. * The models have a low dependency on the quantity of data required for learning embed- dings. Compared to alternative methodologies, such as neural network-based models, these models offer distinct advantages in scenarios with a limited amount of training data availa- ble, ‘Including the Laplacian matrix or transition matrix in the presentation of the graphs allows the models to represent the proximity of the nodes within the charts effectively. The con- nections between every pair of nodes are seen at least once in the matrix, enabling the mod- els to effectively handle networks with sparse connectivity. Despite the widespread utilization of matrix factorization in graph embedding problems, itis es- sential to acknowledge that this approach has drawbacks. ‘+ The computational complexity of matrix factorization poses challenges regarding time and ‘memory when dealing with enormous graphs containing millions of nodes. One primary {factor contributing to this phenomenon is the temporal duration required to decompose the matrix into a series of smaller matrices (S. Cao et al., 2015), ‘© Models that rely on matrix factorization cannot effectively handle incomplete graphs that contain unseen and missing variables (Safavi & Koutra, 2020), When the available graph data is insufficient, matrix factorization-based models may encounter challenges in effec- tively learning generalized vector embeddings. Hence, there is a requirement for neural net- ‘work models that can generalize graphs and enhance the accuracy of entity prediction within chants, The Shallow models have demonstrated considerable achievements over the previous decade, as evidenced by studies (Petozzi et al., 2014; Ribeiro et al., 2017). The primary objective of these models is to represent nodes, edges, and subgraphs as vectors with minimal dimen- sions while maintaining the integrity of the graph structure and the proximity between entities. In genetal, the models initially employ a sampling strategy to capture the form of the graph and the proximity relation. Subsequently, they acquire embeddings using shallow neural network algo- rithms, Various sampling procedures can be employed to collect local and global data in graphs (C. ‘Wang et al, 2020), Specific approaches focus on maintaining the integrity of the graph’s structure by devising sampling methodologies capable of capturing the inherent structure within samples of 295 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI specified lengths. Various sample strategies have been developed to capture local and global graph structures. These techniques include random-walk sampling, role-based sampling, and edge recon- struction, The model subsequently utilizes shallow neural network approaches to acquire vector embeddings within the latent space through unsupervised learning. Selecting an appropriate tech- nique for capturing the graph structure is crucial in enabling external models to reach vector em- beddings effectively. The graph structure can be sampled by examining the connections between nodes within the graphs or sub-structures. In the past ten years, several models have been suggested to effectively represent the graph structure and acquire embeddings (R. Lit et al., 2023; Perozzi et al., 2017), Random-walk-based tactics are among the most prevalent approaches for sampling graph structures, as evidenced by many models. The primary concept behind the random-wallt technique is acquiring knowledge regarding the form of a network to generate pathways that can be interpreted as phrases within texts. Deep Walk (Perozzi et al., 2014) and Node2Vec (Grover & Leskovec, 2016) can be regarded as seminal models that have paved the way for exploring novel approaches in node embedding learning, Motivated by the limitations of matrix factorization-based models, the DeepWalk model employs random-walk sampling to maintain node neighbourhoods, enabling the collection of global infor- mation in graphs. Furthermore, both DeepWalk and Node2Vec aim to maximize the likelihood of ‘witnessing neighbouring nodes through the utilization of stochastic gradient descent on individual single-layer neural networks. Hence, these models effectively mitigate the duration of execution and decrease the level of computational intricacy. DeepWalk is a node embedding model that uses a random-walk sampling strategy to build node sequences, which are then treated as word sen- tences. One of the drawbacks inherent to this model is its inability to effectively enhance the quality of the sampling graph structure by navigating random-walk sampling. To address the constraints of Deep Walk, Node2Vec was proposed, which incorporates a versatile random-walk sampling ap- proach to facilitate the traversal of random walks at each time step. Several limitations are associ- ated with shallow models. + In the context of graphs, the limited capacity of shallow models prevents them from acquir~ ing embeddings for newly introduced nodes. To develop embeddings for novel nodes, the ‘models must incorporate new patterns. This can be achieved by random-walk sampling to ‘generate fresh pathways for the new nodes. Subsequently, the models must undergo re- training to learn the embeddings. The implementation of re-sampling and re-training tech- riques may provide challenges in real applications. + Shallow models, such as DeepWalk and Node2Vec, are primarily effective in analyzing homogenous graphs, but they tend to overlook the properties or labels associated with indi- vidual nodes. However, it is worth noting that in practical applications, numerous charts possess features and brands that can provide valuable information for graph representation learning. Limited research has been conducted on the characteristics and designations of nodes and edges. Nevertheless, the model’s inefficiency and heightened computing com- plexity have been exacerbated by the constraints imposed by domain knowledge in the con- text of diverse and dynamic graphs ‘© One limitation of shallow models is the absence of parameter sharing, which prevents the ‘models from sharing parameters during the training phase, From a statistical standpoint, pa- rameter sharing can decrease the computational time required and the number of weight up- dates needed throughout the training process. To address these constraints, deep neural net- work models are recommended to substitute shallow models. Deep neural network models 296 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS have shown improved generalization capabilities and the ability to capture more graph en- tity interactions and structures Recently, several graph embedding models” efficacies has been tested by large-scale graphs! presence. Conventional approaches, such as shallow neural networks or statistical techniques, could improve their ability to effectively represent intricate graph topologies due to their simplistic ar- chitectural design. In recent times, there has been a surge in research focusing on deep graph neural networks. These networks have gained significant attention due to their remarkable capability to handle intricate and extensive graph structures (Bojchevski & GOnnemann, 2017; M. Liu et al. 2020), In contrast to previous models, most deep neural network-based models utilize the graph structure and node attributes/features to get node embeddings. For example, individuals using the social network platform may own textual data, such as personal details featured in their profiles. ‘When nodes lack attribute information, the attributes or features can be encoded using node degree or one-hot vectors (Kipf & Welling, 2016b). A notable category known as Graph Autoencoders has emerged in deep graph networks. These unsupervised learning algorithms specialize in encoding graph items into latent spaces and reconstructing these entities using the encoded information. The intricate process of encoding and reconstructing data is a defining characteristic of Graph Autoen- coders, granting them a distinctive and influential position in deep graph representation learning. Graph Autoencoder models can be categorized into two major groups based on their architectural attributes: Multilayer Perceptron (MLP)-based models and Recurrent Graph Neural Networks (RGNN). Early Graph Autoencoder models mostly used the Multilayer Perceptron (MLP) archi- tecture during the first stages of their development. The design decision demonstrated its ability to effectively incorporate complex embeddings, as evidenced by the groundbreaking studies (S. Cao et al., 2016; Tu, Cui, Wang, Wang et al., 2018), which established the foundation of this lineage. Another model that can be considered is the RGNNs, which stands out as one of the pioneering approaches in utilizing deep neural networks for graph representation learning. RGNNS are built upon the foundation of GNNs. The primary concept underlying GNNS is the incorporation of mes- sages passing between target nodes and their neighbouring nodes until a state of equilibrium is reached, Recurrent graph neural networks offer numerous advantages in comparison to shallow learning techniques. + RGNNs have demonstrated enhanced learning capabilities in processing scattered infor- mation, particularly in multi-relational graphs with nodes with numerous connections. This capability is attained by modifying the states of every node within each concealed layer. Parameter sharing is a technique employed by RGNNS to share parameters across several locations, This allows RGNNs to capture the inputs of sequence nodes effectively. This benefit could decrease computational complexity during training by utilizing fewer param- eters, enhancing the models’ performance, Nevertheless, a drawback of RGNNS lies in their utilization of recurrent layers that possess identi- cal weights throughout the weight update procedure, This phenomenon results in inefficiencies ‘when specifying various relationship constraints between neighbouring and target nodes. In recent years, convolutional graph neural networks (CGNNs) have demonstrated significant efficacy in addressing the limitations of RGNNs by leveraging distinct weights in each hidden layer. Convo- ution operators can be defined and applied to graph mining, as image data can be seen as a specific instance of graph data, Two distinct methodologies exist for implementing convolution operators in the graph domain. The initial approach relies on the principles of graph spectrum theory, wherein 297 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI graph entities are converted from the spatial domain to the spectral domain, Subsequently, convo- tution filters are employed on the spectral domain. The alternative approach involves utilizing con- volution operators within the spatial domain of the graph. Most spectral models acquire node em- beddings by converting graph data into the signal domain and use convolutional filters, resulting in heightened computational complexity. Kipf & Welling (2016a) presented the concept of graph con- volutional networks (GCNs), which were seen as connecting spectral and spatial methodologies. Despite the effectiveness of spectral CGNNS for performing convolution filters on the spectrum domain, they exhibit numerous drawbacks, which are outlined below © The computational complexities associated with the decomposition of the Laplacian matrix, specifically in obtaining matrices rich in eigenvectors, have been well recognized as time- consuming tasks. The temporal commitment is significantly increased by the repetitive dot product calculations that occur between eigenvectors and Laplacian matrices throughout the training process. ‘In the context of computational systems, a significant issue arises when confronted with extensive networks, There exists a clear association between the parameters that govern the kernels and the number of nodes contained inside the graphs. As a result, the domain of spectral models, which heavily depend on these parameters, may face constraints in situa- tions involving large graph dimensions. The aforementioned subtle constraint highlights the pragmatic factor that spectral models may not be optimally suitable for graphs of substantial scale. © The task of addressing the challenges posed by dynamic graphs entails dealing with a dis- tinct array of intricacies, The application of convolution filters and the training of the model need the conversion of graph data into the spectral domain, often accomplished by utilizing a Laplacian matrix. However, this particular transition presents a significant difficulty. The model’s effectiveness is compromised in situations typified by dynamic graphs, where the data within the graph is fundamentally fluid and liable to change. The current framework, which is highly attuned to the spectral domain, faces difficulties in accurately representing the constantly changing intricacies of dynamic graphs. As a result, it presents a notable dif- ficulty in capturing and accommodating these fluctuations. Spatial models have emerged as a promising option for addressing the limitations inherent in spectral domain-based CGNNs. Spatial models introduce a fresh approach by utilizing convolution operators within the graph domain, enabling the efficient acquisition of node embeddings more powerfully. The field’s current state displays various spatial CGNNs, each with its unique methodology. These networks have gamered significant recognition for their ability to efficiently navigate com- plex graph structures, often surpassing the performance of spectral equivalents (Chiang et al., 2019), However, a shortcoming of CGNNs becomes apparent at the hidden layer. In this context, the model effectively coordinates updating the state of surrounding nodes. However, this dynamism might unintentionally result in slow training and updating protocols, particularly when inactive nodes are present. To overcome this obstacle, researchers have strengthened CGNNS by strategically enhancing the sampling approach (J. Chen et al., 2018; Z, Huang et al., 2021). (J. Chen et al., 2018) proposed the FastGCN model, which aims to enhance both training efficiency and overall model performance, surpassing traditional CGNNs. Amidst a multitude of technological breakthroughs, the issue of 298 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS scalability emerges as a significant worry, Current GNN models face challenges in dealing with the rapid growth of neighbourhoods, which leads to an increase in computational complexity. The urgency of this matter highlights the need for novel approaches that not only explore the extent to which scalability may be achieved but also establish a balanced relationship between performance and computing requirements. By transcending traditional paradigms, the proposed model seamlessly incorporates neighbour- hood sampling into each convolutional layer, employing a strategic approach focusing on crucial surrounding nodes. The intelligent methodology enables the model to adaptively acquire knowledge about the essential neighbouring nodes unique to each batch, effectively focusing on the most critical aspects. One notable example in this field is the Graph Attention Networks (GATS) model, which was proposed by (Velickovic et al. 2017). This innovative model is at the forefront of utilizing the attention mechanism in the complex domain of graph representation learning, The fundamental nature of the attention mechanism is in its capacity to coordinate a deliberate message for every adjacent node throughout the iterative process of message-passing inherent to Graph Neu- ral Networks (GNNs). The significance of each neighbouring node to the target node can be quan- tified by calculating an attention score carefully and deliberately. After performing the necessary calculations, the score is subjected to a normalization process, efficiently aided by the SoftMax function, The normalization process successfully ensures that the scores can be compared across all neighbouring nodes of the target node. Following the harmonization procedure, a node’s em- beddings are generated by skillfully combining the states of its nearby nodes. The orchestration presented in this context demonstrates a dynamic interplay, effectively capturing the fundamental nature of graph relationships in a coherent and influential manner. Furthermore, the GAT model utilized the effectiveness of multi-head attention, a strategic tech- nique that increased the model's capabilities and introduced improved learning stability. However, throughout this pioneering endeavor, a subtle constraint emerges. The GAT model, which relies on attention coefficients to govern its mechanism, unconditionally prioritizes attention. Consequently, this limited framework needs to improve its ability to encompass the intricate details of the global ‘graph structure fully. Recently, there has been a notable increase in the development of novel mod- els, all stemming from the fundamental principle of GAT. The main focus of these models is to enhance the intrinsic capacity of the self-attention mechanism, therefore fostering a more profound connection with the extensive network of global graph structures (Ma et al, 2021). In this quest, researchers actively explore techniques that effectively explore and comprehend the complex harmonies within the more fantastic graph world. Deep neural network models offer several notable advantages: © One notable advantage of deep neural network models is the deliberate utilization of param- eter sharing, wherein weights are shared strategically throughout the training process. This innovative methodology results in three advantageous outcomes: a decrease in the duration of the training, a limitation in the number of training parameters, and a simultaneous en- ‘hancement in the model's performance. Furthermore, the fundamental principle of parame- ter sharing expands its scope to enable models to incorporate multi-task learning, highlight- ing this approach’s inherent versatility and efficiency ‘Another notable advantage that sets deep models apart from shallow models is their ability to engage in inductive learning. This crucial characteristic endows deep-leaming models with the unique capability to surpass the limitations of their training material and extend their knowledge to include unknown instances. The particular skill of these models allows 299 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI them to effectively navigate unfamiliar areas, which gives them practical value and signifi- ‘cance in real-world situations. Nevertheless, some limitations are: © CGNNS rely primarily on an aggregation process for mapping the complex paths of graph structure and entity interactions. The method above diligently collects data from adjacent nodes to enhance the comprehension of target nodes. When many graph convolutional lay- cers are stacked in CGNNs to capture higher-order graph structures, a significant problem arises. The decision to increase the depth of the convolutional layer may unintentionally lead to a problem of excessive smoothing. This occurrence might hamper the model's ca- pacity to accurately detect and analyze small fluctuations and subtleties in the data (T. Chen et al, 2022; L. Zhao & Akoglu, 2019). + Limitations in Disassortative Graphs: The discourse diverges when considering disassorta- tive graphs, which are characterized by nodes with various labels that tend to connect. The inherent aggregation mechanism included in GNNs emerges as a constraint and obstacle. Despite the varied tags assigned to nodes, the aggregation process uniformly selects attrib- ‘utes from surrounding nodes, concealing the subtle disparities that are the basis of disassort- ative graphs. This constraint becomes especially significant in classification tasks, as GNNS struggle to effectively integrate their aggregation approach with the complexities of disas- sortative graph structures. 3.4 Notation and Fundamentals of Graph Representation Learning Mathematically, a graph is represented as G = (V;E) where V 3} vy} isa set of N IV |nodes and EC V x V isasetofM = |B] edges between nodes. We use A € R™*" to denote the adjacency matrix, whose 1** row, j*column, and an element are denoted as A(i,: J:AG,/): AG), respectively. A node v and an edge ¢,,, can store characteristics or qualities rep- resented by vectors x, and xf, respectively. The node characteristics of a graph are expressed by amatrix X € IR", where d represents a node feature size. A matrix represents the edge features ofa graph X* € R”™*, where m and c depict the number of edges and an edge feature size, re- spectively. XT denotes the transposition of a matrix, and the element-wise multiplication is written as X1@X2. In this study, matrices are represented by uppercase bold characters and vectors by lowercase bold characters unless otherwise noted. Table 2 contains all the symbols and abbrevia- tions used throughout the study 3.2 Classification of Tasks Via Graph-based Data Hierarchies ‘As seen in Figure 1, the data represented by graphs have the potential to have information integrated at various levels of the structure. At the node level, the different node-based task has their defini- tions. In addition, edge-level tasks are also definable. The tasks that need to be completed at the graph level can be tailored to the requirements of the various applications. Representing Node Tasks: The goal of representing node tasks is to learn representations of var- ious network elements, including nodes. Node-level taxonomy is the focus of node classification, node clustering, and node regression, among other related techniques. Node classification, unlike regression, classifies nodes rather than predicting their values. It is important that representations, 300 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS Notations c= Ub) feats od) xe Rnd x eR” Xe € Bm Mie € RE A ® Ml an an =Saup L=D-A Qng’ = eR W.0,8 s pe et | Figure 1: Types of Graph Tasks Descriptions ‘A greph “The number of nodes = [V| The mumber of edges = |E| The set of nodes Groph feature matrix [Node v featwe vector Feature mutex for a graph edge The feature vector ofthe edge “The adjacency matrix Element-wise product, The length ofa set The transposition ofthe matrix X ‘The degree matrix of A “The Laplacian matrix “The eigen decomposition of L ‘Node hiden feature max ‘Leamable model parameters ‘A-search space ‘Table 2: Notations. optimally embed the input graph so that algebraic operations on the embedded graph accurately reflect the graph’s topology. Finally, representations can be used as inputs in models to predict the property of graph elements, such as the role of proteins in an interactome network (i.e., node clas- sification task). 301 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI Representing Edge Tasks: Among the edge-level tasks are linked prediction and edge categori- zation. Link prediction and edge classification entail the model predicting whether two nodes have an edge and classifying the edges. Recommendation systems are an excellent illustration of an edge problem, such as the prediction of winether or not a medicine will bind to a specific target protein (recommend the items users might like). Itis indispensable to model the representation of a network to find solutions to problems involving classification, regression, and matching. At the graphs’ level, gaining insights can be accomplished by classification. Examples of small molecular graphs include antibiotics and other medications. Drug discovery and toxicity profiling are two of the most prevalent graph-level operations (i.e, graph classification tasks). In this metaphor, the atoms func- tion as the “nodes,” and the chemical bonds tat connect them act as the “edges.” A simulation of physical occurrences, where the particles themselves serve as the vertices, inter-particle interactions serve as the edges, and so on. It is possible to break tasks at the graph level down into those atthe subgraph level. 4. Learning Methods for Complex Graphical Representations Graph leaning parameters depend on data accessibility and real-world needs, like classical ma- chine learning. According to the literature, graph-based learning tasks can be supervised, semi- supervised, unsupervised, of self-supervised. 4.1 Supervised Learning on Graphs ‘The intent is to use labeled data for model training (e.g., labeled nodes), In contrast to supervised Jeaming, which relies on predetermined labels, the currently dominant strategies for generating graphs are unsupervised. Labeled instances are used before fo improve the graph for downstream study tasks, Dhillon etal, (2010) examine node pair similarities using labeled points. Ifthe manifold sampling rate is high enough, the optimum solution for a neighbourhood graph can be considered aKNN graph subgraph, as Rohban & Rabiee (2012) shows. Berton & Lopes (2014) propose Graph- based labeled instance informativeness (GBILI), which builds on the work of (Ozaki et al., 2011) by using the label information differently: they utilize a graph to determine witich instances are ‘most informative. Berton et al. (2017) improved the Robust Graph that Considers the Labeled Instances (RGCLI) technique, based on GBILI (Berton & Lopes, 2014), to build more sturdy graphs by resolving an optimization difficulty. L. Zhuang et al. (2017) presented low-rank semi-supervised representation as a unique way to integrate labeled data in the low depiction for enhianced accuracy (LRR). The added supervised information from the created similarity graph greatly improves the subsequent label inference process. Complex algorithms and models have been presented over the past few decades, with the major- ity falling into two broad approaches: regularized graph Laplacian-based approaches and graph embedding-based methods (Kipf & Welling, 2016a). Label propagation using Gaussian fields and harmonic functions (X. Zhu et al., 2003), Manifold propagation (Belkin et al., 2006), and deep embeddings (Weston et al., 2008) are all examples of the first category. Examples of works from the latter category are DeepWalk (Perozzi et al., 2014), LINE (Tang, Qu, Wang, et al., 2015), and node2vec (Grover & Leskovec, 2016). These procedures take their cue from the skip-gram model presented by (Mikolov et al, 2013), employing a wide range of different random-walk and search- based techniques. Supervised graph representation learning provides explicit labelled model opti- misation. This advantage produces high-performance findings with abundant tagged data. Models understand labelled data. They improve forecasts and categorization. Professionals who can 302 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS accurately and subtly predict or categorise, especially with machine learning, excel. Human anno- tators can offer accurate labels for high-quality training data. Performance evaluation objectivity is improved by comparing supervised methods to ground truth labels, Depending on labelled data is supervised learning’s main drawback as it requires expenditure of time and funds, and obtaining enough labelled samples tends to be tedious. Supervised approaches rely on labelled data patterns, which may limit their applicability to unknown or non-conforming data distributions. When la- belled data is abundant and rich in information, supervised learning approaches can produce excel- lent results. Prediction and classification tasks support this claim, However, label noise, bias, or an ‘unequal data distribution in the training set may hinder supervised techniques. Supervised leaming hhas these issues. Scalability depends on graph data and learning model complexity. Complex mod- els with wider parameter spaces may require more computational time and resources. 4.2 Semi-Supervised Learning on Graphs ‘When only a small amount of tagged data is available, a semi-supervised study trains models using both input types. Due to the graphs spatial connectedness trait, semi-supervised graph leaming techniques can effectively use unlabeled data. According to the manifold assumption, low-dimen- sional manifold nodes that are physically closer together are more comparable and hence should be given the same label. Semi-supervised learning has evolved to make use of a wide variety of tech- niques. Since the graph structure is compatible with numerous assumptions in semi-supervised earning, this burgeoning topic is a strong fit. Nodes reflect data instances, and edges represent similarity in graph-based semi-supervised learning. The manifold premise states that high-edge- weight nodes are comparable and have the same label category. Graph structures are simple and expressive, successfully making manifold-based graph semi-supervised learning approaches. Sev- eral semi-supervised survey studies focus on traditional methods of dealing with semi-supervised circumstances (Prakash & Nithya, 2014), Some recent efforts, including (Van Engelen & Hoos, 2020), investigate semi-supervised learning by examining graph creation and regularization. Mul- tistep approaches cannot construct an end-to-end optimization and learning framework, Still, the ‘methods mentioned above (supervised leaming) do this by first learning the grapla’s embeddings, then optimizing the object functions. Fortunately, new developments in deep leaming can bridge the gaps between graph embedding and explicit learning problems in both node level and graph level leaming. Since graph classifica- tion and regression aim to train a classifier or regression model to anticipate the unobserved labels or targets, these activities can be branded under (semi)-supervised learning. This section discusses recent developments in graph embeddings, an important part of Graph semi-supervised learning. Current methods for Graph semi-supervised learning are summarized in Table 3 4.2.1 GRAPH EMBEDDING Two distinct kinds of embeddings are distinguishable in semi-supervised learning approaches that use graphs. While one describes the entire Graph, the other represents a particular node (W. L. Hamilton et al., 2017). Both embeddings aim to accomplish the same aim: to represent the item in a space with limited dimensions. Graph-based semi-supervised learning challenges rely on embed- ding nodes. It represents a vertex in a low-dimensional space with a local structure. The node embedding on graph G = (V,£) is a mapping hy: p > z, € R4,Vp € V whered is smaller than |V|. ht, maintains graph G's node closeness metric. The function of loss used by graph em- bedding techniques is described in the plenary Equation 1, 303 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI CA = eer pen, CaM He). YO + HEaenea, CCA) — (1) Paper Embeds Emibeding Tos Function Decoder (De) Comparison ding Ae Techniques Formula shite: ture EGOS Shao FaONAON Ber ep ey) Shall Er Te al, 2015) (Ouetal, Shallow Factorization 1Dee(2p. 29) ~ Sto. all ay Similarity 2016) ‘nat 5 Rowseis Shallow Factorization ¥yizp ~PqApgzl? An owns Eollép ~EeApezll 2100) (Belkin Shallow Favtrization Dee( sy, 29) Sto.) Ara Nivoat 2003) (ang, Shallow Random Sipqpen log (o( RY. Decoder: The decoder module rebuilds graph statistics from the implanted node. The decoder can try to anticipate the next neighbour NV (v) or row node embedding 2, in the adjacency matrix Alu]. Pair-form decoders are proven to predict node similarity. Mathematically, Decoder: R¢ x BR? Rt 4.2.2 SHALLOW GRAPH EMBEDDING It is preferred for node embeddings to reflect graph structure characteristics immediately surround- ing them. The framework known as shallow network embeddings aims to optimize a neural network to generate embeddings that keep these features intact, A crucial characteristic of this feature is that the degree of resemblance in the embedding space ought to closely correspond to the degree of similarity in the source Graph (Figure 3). Different approaches are taken due to the numerous syn- onyms for “method.” For instance, the brief path length between two nodes can be used to describe network similarity, whereas the dot product can be used to define embedding space similarity. More advanced approaches can capture more detailed similarity measurements, allowing a more accurate reflection of the network structure, The semantic meaning of edges (i.e., relation types) can be useful to incorporate into learned embeddings for heterogeneous graphs. Since each relation is split into many embeddings representing the head node, the tail node, and the relation type, knowledge graph embedding methods (Bordes et al., 2013; Dong etal., 2017; Nickel et a, 2011; Z, Sun et al, 305 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI 2019; Trouillon et al., 2016; B. Yang et al., 2014) generate similarity metrics by considering a wide variety of node relations. Factorization: For the class of factorization-based approaches, a matrix is factorized to provide the node embedding, with the matrix specifying the relationship between each pair of nodes. When constructing a similarity network, this matrix typically includes essential structure information like normalized Laplacian and adjacency matrices. The factorization of such matrices can take many forms depending on several factors. The eigenvalue decomposition of the normalized Laplacian Matrix makes sense because it is also positive and semi-definite Random Walk: A usefull method for approximating several properties of a Graph, such as its nodes® centrality (Newman, 2005) and similarity (Fouss et al., 2007), One of the attributes that may be approximated is the degree to which two nodes are similar. Therefore, random node embedding strategies are useful when just a portion of the graph is available or where the graph is too big to be effectively managed, According to one definition (Perozzi et al., 2014), “similacity” is defined as “co-occurrence in a series of random walks of lengti k.” It is possible to reproduce random walks on graphs through various approaches, stich as depth-first search algorithms, breadth-first search walks, and hybrids (Grover & Leskovec, 2016). Shallow embedding approaches have performed well on many semi-supervised tasks, but re- searchers have struggled to overcome their drawbacks. Shallow embedding generates only one graph embedding. It also does not assess node properties, and few common parameters are used. Since the encoder creates a fresh vector at each node, it requires its unique set of parameters. Shal- low embedding approaches exclude node characteristics, another major concern. Encoding may contain extensive feature information, Semi-supervised learning uses this since each node contrib- utes feature information. Shallow embedding techniques have always relied on a transductive ap- proach (W. L, Hamilton et al., 2017). Nodes identified after the training phase cannot generate their embeddings. Because of tis limitation, inductive applications cannot use shallow embedding tech- niques 4.2.3 DEEP GRAPH EMBEDDING Recently, many deep embedding approaches have been created to circumvent these limits. Embed- ding something shallow requires a different set of skills than embedding something deep. In this case, the encoder would consider the properties of the graph in addition to the graphs structure. ‘When doing semi-supervised leamiing tasks with a transductive setup, the node embeddings are utilized for training a top-level classifier, which then makes predictions regarding the class labels of unlabeled nodes. Auto-encoder-based approaches diverge from shallow embedding approaches in two key respects; they rely on Deep Learning (DL) models and employ a unary decoder rather than a paired one. As shown in Figure 2, the goal of the Auto-encoder based method is to encode each node according to the linked vector s; and then restructure it using the embedding findings, expecting the restored one to be as close to the initial as possible 4.3 Graph-based Unsupervised Learning Data samples and annotated labels from the real world are presented when working in a supervised or semi-supervised scenario. In the unsupervised context, labeled samples are unavailable; there- fore, the loss function must infer the properties of the Graph’s nodes, edges, and topology. Perhaps the most emblematic task in unsupervised graph learning is the link prediction problem, which 306 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS Figure 3: Shallow Network Embeddings. involves predicting the existence of unseen edges in a graph, Table 4 represents contrastive learn- ing, auto-encoders, and random walks as the basis of several unsupervised graph learning methods. 4.3.1 CONTRASTIVE LEARNING ‘The unsupervised learning environment uses contrastive learning to acquire expertise in graph rep- resentations. Deep Graph Infomax (DGI), which is an extension of the deep InfoMax described by (Ejelm et al, 2018), was proposed by (Velickovic et al., 2019), DGT exploits the shared information amid the representations of the nodes and the graph. Infograph (F.-Y. Sun et al., 2019) simplifies Jeamning graph depictions by optimizing the information gained across graph-level and subgraph- level illustrations of varying sizes, like nodes, links, and triangles. First-order adjacency matrices representation and graph diffusion are compared in (Hassani & Khasahmadi 2020), which achieves State-of-the-art (SOTA) results on various graph learning problems. (Okuda et al., 2021) recently employed an unsupervised graph representation study to identify generic objects and create a lo- calization strategy for accumulating images of individual objects. 4.3.2 GRAPH-RASED AUTO-ENCODERS In their ground-breaking work, Kipf & Welling (2016b) introduced the Graph Auto-encoder (GAE), an enhanced auto-encoder designed to handle graph-structured data effectively. This para- digm’s fundamental training principle is centered on a loss computation that effectively contrasts the original adjacency matrices with their carefully recreated counterparts, ‘The emergence of the Variational Graph Auto-encoder (VGAE) methodology reveals a fundamen- tal role played by variation in the learning process. In innovation, two parallel investigations were conducted by C. Wang et al. (2017) and J. Park et al. (2019). The researchers aimed to deviate from using the adjacency matrix and instead focused on accurately reproducing feature mattices in their investigations. The minimized Graph Autoencoder (MGAE) is a novel approach that utilizes ‘minimized noise removal principles to capture the characteristics of flexible nodes effectively. In a notable advancement, J. Park et al. (2019) endeavored to enhance the breadth of decoding capa- bilities. The researchers improved the decoding process by strategically incorporating Laplacian sharpening, effectively revealing concealed states with exceptional precision. The culmination of this transformation resulted in the emergence of an asymmetrical Graph Auto-encoder known as 307 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI Paper Featurere ‘Method Task Important Functions ‘icval (FY. Sunet_Contiasive __Kelayer Graph Node Classifiation Crap level depictions al. 2019) Convolutional (NC), Link predic Network (GCN) tion LP). Graphs Classification (GC) (ijelmet —Coutuastive Neural Newouk Ne ‘New method for representation study al, 2018) oN) ‘without supervision (Welickovie Contrastive oN Ne Inereasing node-graph correlation eral, 2019) (Hassani & — Contrastive cen NC.GC Acquiring knowledge of node and Khasahmadi ‘araph-level istration. 2020) (Panetal, Graph Auto- ocx Lp.4c Lov dimensional modelling of gaph- 2018) encoder structured data for graph analytics, (GAE) 7 Patk et CAE cen NC.LP,GC A graphs itregular regions were used al, 2019) to extract low-dimensional latent rep- (G.cuiet GAE cen NC\LP Vectorize node properties and network. al, 2020) seuctuse vi graph embedding, (CWanget —GAE cx oc Marginalized graph auto-encoder elus- al, 2017) teviug technique. (Kipt Gar ocx LP “The uaderiying representations of une Welling, sirected graphs that are compretensi- 20186) ble ae taught. (LListal, — GAE NN Ne Mastering knowledge acquisition pro 202) esses in both familiar and distant do~ (Dongetal, Random NN Ne Leatning a Heterogeneous Representa: 2017) Walk RW) tion of Network: Nodes (Adhikariet RW Nw Ne Create a subgraph embedding ise al, 2018) (Tang. Qu. RW NN NC.LP Embeddings of low-dimensional nodes ‘Wang, eal, in exiremely large networks, 2015) Table 4: GNN-based unsupervised learning methods the Graph Convolutional Auto-encoder using Laplacian smoothening (GALA). GALA is antici- pated to pave the way for new advancements in the ever-evolving landscape of graph-based auto- encoders. 4.3.3 RANDOM WALK Random walks have also been shown to compensate for structural equivocation, which occurs when two vertices have similar local structures and similar embeddings, and when two vertices have similar embeddings and belong to the same community (Du et al., 2018). Perozzi et al. (2014) efficiently capture large-scale networks’ graph structure through the Deepwalk approach. Du et al. (2018) and Perozzi et al, (2014) revealed that random walks and contemporary language modeling representation learning methods might produce high-quality vertex representations for downstream 308 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS Jeaming tasks like vertex and edge prediction. Adhikari et al. (2018) and Dong et al. (2017) have expanded random walk-based approaches to capture vertex representations in heterogeneous graphs and subgraph embeddings, 4.4 Graph-based Self Supervised Learning Self-supervised leaming is a cutting-edge field of study in DL. In semi-supervised learning, the reliance on hand labeling for identification, and ground-truth labeling, which requires intensive processing, prediction accuracy, and inadequate protection in contrast to adversarial attacks, are all addressed (X. Liu, Zhang, et al., 2021). By teaching a model to complete carefully crafted “Pre- textTasks,” self-supervised leaming overcomes the abovementioned drawback. Self-supervised earning performs “downstream tasks” (Y. You et al., 2020) such as node, edge, and graph level tasks better because it learns more generic representations given unmarked input. Table 5 presents an overview of the current approaches utilized in graph self-supervised learning, 4.4.1 PRETEXT TASKS Self-supervised learning relies heavily on the model’s ability to perform well on downstream tasks; hence their development is crucial. We classify the pretext task as follows: Masked Feature Re- gression (MFR), Auxiliary Property Prediction (APP), Same-Scale Contrasting (SSC), Cross-Scale Contrasting (CSC), and Hybrid Self-supervised Learning (HSL), ‘Masked Feature Regression. The computer vision task of picture inpainting inspired a new cat- egory of pretext challenges called Masked Feature Regression (MER) (J. Yu, Lin, et al., 2018). The goal of this technique is to change the attribute of a node or edge to zero or another number. The primary objective is to unearth the original node/edge information prior to when GNNs obfuscate the data, (Y. You et al., 2020) node-based MFR approach allows GNN to obtain features from environmental data, Reconstructing raw features from noisy input data, ideal input data, and noisy feature embeddings are context problems used to acquire robustly generalized models. ‘Table abbreviations: Node Classification (NC), Link Prediction (LP), Graph Classification (GC), Augmentation Same- Scale Contrasting (ASSC), Context same-scale contrasting (CSSC), Cross-Seale Contrasting (CSC), Classification Aux- illaxy Property Prediction (CAPP), Hybrid Self-Supecvised Leamiag (HSSL), Masked Feature Regression (MER), Pre tuaining and Fine-tuning (PT&EFT), Collaborative Leaming (CL), Unsupervised Representation Learning (URL) Paper ‘Pretext-Modeof ‘Method ‘Task Tmportant Functions Task Training cate. gory (hatuwal, ASSC URL GCNNC x “Graph contrast Teaming with personal 2021) ized enhancement (H.zhonget SSC -URLCL«GCNGIN. «——GC_—Tteratively performed selfdsilation ‘al, 2020) ‘with graph augmentations (Quetal, asc URL on GC Random walks supplement subgraphs, 72020) ' ‘and artifical positional node embeddings PTET ae node characteristics, G.zeng& ASC URL NN GC Marginalized graph auto-encoder clus Xie, 2021) prerreL tering technique (.Wueral, SSC PT&ET cen LP GCNrecommendation reliability and du 2021) sablity improvement. 309 (Choudhary et al, 2021) (Che etal. 2021) Gin etal 2020) (Kiph Welling, 20186) (Z. Peng. Dong, et. 2020) (Kim & on, 2022) (Zhang et al, 2020) (You etal 2020) (Rong etal 2020) (OK sun etal, 2020) metal, 2019) (Subramonian, 2021) (Ejetm eta, 2015) (Q sua etal, 2021) (Ren etal. 72020) (€. Park etal. 2020) (.ca0eral, 2021) (Opotka et al. 2019) (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI Asse esse esse esse esse esse cAPP cAPP cAPP cAPP ese ese ese ese esc ese ese PISET PrSFT (cy) a a PTET PISETICL PISET a PISET a a z GaT NN NN GeNioat NCLP xe Ne Ne NLP, Ne NLP, xe xe NC.LP NC\LP xe 310 Automating several pretext task, Graph representation study. Building domain-specific pretext tasks using unlabelled data reduces DL’s need for expensive ansotated data Design explainable uadizected graph im plicit representations. Established a subtask to forecast meta paths using node embeddings ‘Graph attention system for noisy grap. Pro-tnining a graph-based transformer ‘mode iavelvesstctive recovery: Pre-computed cluster index for node clustering ‘Acquiring rich molecular structure and semantic knowledge fiom massive on tagged molecular data ‘Tinin a psevdo-label encoder achitec- Gun-prevained structural feanze ex- action. Discovering pattem motifs iteratively ‘and improving graph-motif embeddings Leaming graph-stricmred node repre Differentnting the subgraph between srapls using the local subgraph and _lobal Graps representations, Enhanced local and global shared ‘knowledge for representation eamiag in Ieterogeneous graphs. Embedding a multiplex newwork with at- tributes, Maximizing mutual information com plctes a bipatite graph embedding. Max- jimum knowledge sharing. ‘Training embeddings to solve node-level regression, A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS esc URL GCN NC,GC_—_ Optimizing node mutual information for staph learning. (E-V.suact CSC URL NN GC Improving substructure-graph represen al, 2019) tation knowledge to improve graph-level tacks, (isoetal, SC URL con NC Graphs representations learning uses good 2020) ‘compatibility among Key nodes and their fobserved suvgrapts to capture regional structure information. SC PISFT — Shallow NN-«-LP_—_Contextual node forecasting in heteroge- a, 2021) neous nerworks. Ginetal, HSSL_—CL_—S GCNHGCN NC, GC_Effetively leverage several pretest tasks 2021) automatically (Steal, HSSL URL cox NC Identifying Ethereum Phishing Seams. 2021) (Wanetal, HSL CL GENHGCN NC Generative and contrastive Convott 2021) tioual graph network (Lineal, HSSL URL GON —NC\LP, Categorization of finds images using 2021) GC | several abels (WHuetal, MFR PTRFT GCN NC Am innovative method for pretraining 2019) ‘GRNS on nodes aud sraphs, (Mauessi& MER, a oon NC Develop GNN models using a mul-tak- Roza, 2021) ing approach, ‘Table 5: GNN-bace self-supervised leaning methods Auxiliary Property Prediction In addition to the aforementioned MFR methods, there are further ‘methods that investigate the fundamental attribute data of nodes and edges, as well as the graph topology, to generate new pretext jobs for the self-supervision models. Additional property predic- tion methods include regression and categorization (Y. Liv et al., 2022). In contrast to MFR, the regression-based method emphasizes context problems, such as the prediction of graph properties based on features and structures, rather than on numerical analysis. One such local structure-aware pretext job that considers both local and global structure information is tiie NodeProperty pretext task proposed by (Jin et al., 2020). A method called Distance2Cluster was put out by (Jin et al 2020) to determine how far unlabeled nodes are from preset clusters in the graph. This method causes the node illustration to reflect the training's overall location, PairwiseAttrSim, also proposed by (din et al., 2020), attempts to minimize the discrepancy between a pair of nodes’ similarity value and their feature matching on the representation dispersion. It is based on improving feature mod- ification for local structures to avoid over-smoothing. Creating pseudo labels to aid model training is taken on by classification-based methods instead of regression-based techniques. Multi-Stage Self-Supervised (M3S) is a technique proposed by K. Sun et al, (2020) that involves training an encoder design to assign dummy labels to unlabeled nodes. Over the entirety of the training procedure, the DeepCluster (Caron et al, 2018) network is 31 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI the backbone of this approach. Y. You et al. (2020) created the Node Clustering approach, which is quite similar and uses self-supervised labels in the form of a previously computed cluster index. Same-Scale Contrasting: Unlike the first two methods, which build upon a single node in the graph, contrastive leaming techniques improve by continually practicing consensus across two nodes (such as a single node). This strategy aims to increase the number of positive pairs or samples that share the same semantic data while minimizing the number of negative pairs. Same-scale con- trast (SSC) splits two graph parts by comparing them on a similar scale, such as graph-to-graph and node-to-node. Additionally, SSC-based procedures are classified into two groups (context and aug- mentation) depending on the idea of good and adverse pairings. Based on the context-based same-scale contrasting (CSSC), the contextual nodes are brought closer together in the embedding space. In a Graph, the contextual nodes are typically located close to one another. The Homophily hypothesis (McPherson et al., 2001) forms the basis for the percep- tion that entities with similar semantic data should be related. Creating collections of nodes with comparable semantic data via a random walk defines a context. Positive pairs refer to nodes closer together, whereas opposing pairs refer to node pairs obtained through negative sampling, In the past few years, contrastive visual feature learning has significantly progressed (K. He et al, 2020). These developments also drive augmentation-based same-scale contrasting (ASSC), which creates fresh augmented examples for real-world data sets. ASSC’s definition of the data augmen- tation process is essential. Positive pairs of enhanced samples from genuine data are treated as such, whereas negative pairings of augmented samples from various actual data are so-called. These ‘methods utilize mutual information (MD estimate (Hjelm et al, 2018) and InfoNCE for estimation (Oord et a, 2018) and come under this category. For node-level tasks, Qin etal, (2020) introduced a technique known as Graph contrastive coding (GCC), concentrating on universal unattributed graphs. This method augments subgraphs using random walks and restarts for each node before using positional node embeddings that were purposefilly created as node features. Graph Contras- tive Representation Learning (GRACE) by Y. Zhu et al. (2020) leams two augmentation proce- dures by deleting masked node characteristics and edges to improve graph representation, External and internal negative pairings are contrasted. Cross-Seale Contrasting: Cross-Seale Contrasting (CSC) is an alternative contrasting learning method to SSC used to lear representations of graphs at different scales (node-subgraph, node- graph contrasting). The graph or subgraph summary is often obtained via a readout function. The goal of these techniques is the same as that of ASSC, which is to maximize mutual information, Hjelm et al, (2018) suggested utilizing the Jensen-Shannon divergence to estimate mutual infor- mation. (Velickovie et al., 2019) created Deep Graph Infomax (DGI) to maximize the mutual information between the top-level graph summary and the related patch representations to learn node representations. DGI obtains negative samples of each Graph by deliberately contaminating its node attributes, The reliability of the Graph’s architecture is preserved through DGI. Also along these Lines is Multi-view representation learning on graphs (MVGRL), a technique developed by (Hassani & Khasahmadi, 2020) to pay attention to multi-view contrasts. This method separates the perspectives of the original graph structure and the diffusion. The goal is to maximize the two-way information flow between various perspectives represented in different ways. Jiao etal. (2020) have presented the Subgraph Contrast (Subg-Con) to contrast the subgraph context and the node embed- dings to comprehend the regional spatial relationships of the graph. A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS Hybrid Self supervised Learning: Few proposed systems effectively merge pretext tasks from several domains into multitasking learning algorithms. The Generative Pre-Trained GNN (GPT- GNN) (Z. Huet al, 2020), based on the multi-feature regression (MFR) algorithm, is a pretraining approach for the GNN that replaces the edge prediction problem with the production of new graphs. Z. Peng et al. (2020) devised a method called graphical mutual information (GMI) to include and optimize shared knowledge between the basic feature of a nearby node and the node embedding, ‘At the same time, it optimizes the edge similarity metric for graph representation learning (node embedding of two adjacent nodes). Bert (J. Zhang et al., 2020) offers a node feature reconstruction methodology. The MFR and graph structure retrieval are used to pre-train a transformer-based graphs model. To acquire knowledge about context-based SSCs and augmentation as self-super- vised study signals, Wan et al. (2021) used downstream node classification tasks. 4.4.2 SELF-SUPERVISED TRAINING STRATEGIES Based on the relationship between pretext tasks, downstream tasks, and graph encoders, self-super- vised learning techniques can employ one of three training strategies: Collaborative Learning (CL), Pretraining and Fine-tuning (PT & FT), or Unsupervised Representation Leaning (URL). For Col- laborative study, the encoder and the pretext task are trained together, Self-supervised learning is combined with task loss function error to form the combined loss function. A trade-off hyperpa- rameter can adjust the relative importance of individual errors in determining the total error. It’s being passed off as practice for multitasking or as an attempt to standardize the work that comes later. Th Pretraining and Fine-tuning, the encoder and the pretext jobs are taught beforehand; this can be thought of as setting the encoder’s default values. Also, the prediction head and pre-trained encoder are simultaneously tuned per the instructions of precise downstream jobs. Like PT and FT, Unsupervised Representation Learning completes pretext tasks and pre-trains the encoder at the outset. However, the second phase is distinct due to the fixed encoder parameters. URL is more challenging than other training approaches because each encoder is trained separately. 5. Graph Neural Networks Graph Neural Networks (GNNs) is a Deep Learning Neural Networks (NN) appropriate for ana- lyzing graph-structured data, It is analogous to a graph, with nodes representing the data to be studied and edges representing the connections between them. Graph theory and deep learning un- derpin GNNs. The graph neural network is a model class that learns model class that learns data structures and graph tasks using graph representations. Feature propagation and aggregation in GNNS improve graph representations. Graphs are frequently used in representation learning tasks, where the Graph includes some domain knowledge that, while not explicitly stated in the graph structure, can be leamed through instances. In a nutshell, graph neural networks extract more in- formation from data with less organized labeling, iteratively spreading neighbourhood information until convergence (recognizes the structure of a given graph or node and learns to represent it). These computationally intensive investigations belong to recurrent Graph neural networks (RecGNNs) (Z. Wu et al., 2020). Despite efforts to overcome these weaknesses, many scholars have used the achievement of CNNs in computer vision to advance new convolution algorithms related to graphs, such as convolutional GNNs (ConvGNNs). Spectral-based strategy models (Y. Zhang et al., 2019) and spatial-based strategy models (Micheli, 2009; Niepert et al., 2016: H. Peng et al, 2020) can be used to categorize ConvGNNs. In the spectral-based approach, models are (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI Figure 4: Neighbourhood Aggregation. constructed using spectral graph theory, and eigen decomposition of Laplacian graphs filters sig- nals on a graph (Oellermann & Schwenk, 1991). In the spatial-based method, the graphs themselves undergo the convolution processes, and the convolution itself is embodied as a blend of feature exchange from the neighbours of each node. In contemporary GNN models, the illustration vector of a node is calculated by iteratively aggregating and altering the illustration vectors of its neigh- bours; this approach is also known as neighbourhood aggregation (or message forwarding) (Gilmer etal, 2017). The goal is to teach a feature function f (G) to recognize patterns in G. The function receives the following input: Matrix representation of the Graph G structure, normally in the shape of an Adja~ cency matrix A (or other functions) and yielding node-level or graph-level output, and a feature depiction xy for each node v € V summed in a feature matrix X. The application must specify the vectors used to represent the node features in this calculation. In a database context, they may take the form of word vectors, pixel values, or a hybrid of image characteristics and word vectors to describe scenes. The aim is to acquire a good vector illustration over time; hence aggregation is performed recursively over neighbouring nodes. After aggregation by k rounds (k = 1,2,...,K), the representation of a node gets the depth information inside the K-hop network neighbourhood that node (Figure 4). The formal description of the feature vector of a node v in a GNN model at the k-th iteration, where k is the number of layers in the model, is as follows, given a graph 6: W® = Acr®W, AGG (CHL w E NCW). @ y= BRAS = x, @) where k € [1, K] shows number of iterations, We isa learnable weight matrix in the k-th layer, By is a bias, B, AS! v is a self-loop stimulation for node v, ACT.) is a non-linear stimulation function in the k — th layer, e.g., Rectified Linear Unit (ReLU), sigmoid, and AGG“)(.) is a com- bination function in the k-th layer. e.g., max-pooling, sum. The layer's output typically denotes each node's ultimate representation. Applying pooling to layer representations is useful for activie ties at the graph level. Optimal graph representation learning by a GNN model relies on the careful selection of ACT“)(.) and AGG“)(.). The main challenge is to establish the aggregate schema 34 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS and aggregation order for each node, Several GNN alternates with distinct neighbourhood combi- nations and graph-level pooling schemes have evolved in recent years. This includes graph convo- Iutional network (GCN) (Kipf & Welling, 2016a), graph isomorphism network (GIN) (K. Xu, Hu et al., 2018), graph attention network (GAT) (Velickovie et al., 2017), and local extrema graph neural network (LEConv) (Ranjan et al., 2020). These GNNs have demonstrated prevailing perfor- ‘mance in a variety of tasks, including semantic segmentation (L.-Z. Chen et al., 2021), node clas- sification (Z. Liu et al., 2019), and recommendation systems (C. Huang et al., 2021; Pang et al, 2022; J. Zhang et al., 2019), among others. 5.1 General Layout and Structures of G Here, we look at GNN models from the perspective of a creative director. First, the study covers the big picture of creating a GNN model, as depicted in Figure 5. Subsequently, we discuss each step-in great depth in the sub-sections below. As a rule, four stages are involved in designing a GNN model for a particular task on a specific graph type. a) determining the application’s graph structure, b) characterizing the graph’s kind and scale, c) developing a suitable loss function, and 4) developing a model incorporating computational modules. n the following paragraphs, we'll discuss the requirements for different design stages. 1 GRAPH STRUCTURE ‘We must first ascertain the application’s graph structure. Scenarios typically fall into one of two categories: structural or nou-structural. Many systems, including physical systems, chemicals, knowledge networks, ete., exhibit graph structures. In addition to being present in structural con- texts, implicit graphs also occur in non-structural domains. Thus, we must first construct the Graph from the specified task, such as creating a fully linked “word” graph for text or the scene graphs for an image. Once we have the Graph, we can determine which GNN model will work best. .2 TYPES oF GRAPHS ‘The type and size of the graph should be determined when we understand the Graph’s structure within the context of the relevant application, More information about the graph’s nodes and edges can be stored in a complex graph, There are a few common ways to classify graphs. a) Directed or Undirected Graphs: Whether a graph is directed or undirected depends on the direction of its edges, but directed graphs carry more content than undirected graphs. Undirected edges in a graph are considered the same way as two edges in directed graphs b) Homogeneous or Heterogeneous Graphs: In contrast to heterogeneous graphs, which fea ture a wide variety of node and edge types, homogeneous networks have a uniform structure. Re- searching the different sorts of nodes and edges is essential for understanding heterogeneous graphs. ©) Static or Dynamic Graphs: A dynamic graph is one in which the input attributes or graph structure vary over time, Temporal information in dynamic graphs should be carefully analyzed. Since these classes are orthogonal to one another, they can be combined to form a dynamically directed heterogeneous graph. Signed graphs and hypergraphs are two further types of graphs that serve specialized purposes. Both “small” and “large” graphs look the same to the naked eye. Due to the ever-evolving nature of computing, the criteria are constantly evolving. 31s (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI Figure 5: A GNN model’s overall design workflow .3 DESIGN Loss FUNCTION It is the task at hand and the parameters of the training set that inform the optimization of the loss function, There are essentially three categories of graph-learning initiatives. In node-tevel tasks, such as node classification, node clustering, node regression, and so on, the qualities of individual nodes are emphasized. The objective of node classification isto divide nodes into manageable chunks. In node clustering, similar nodes are grouped for easier analysis. A valid value is predicted for each node, so this process is called “node regression.” Fage-tevel tasks include edge classification and link prediction, which entails the model making predictions about the existence of edges between pairs of input nodes and identifying the types of edges in a network. Graph-level tasks like matching, classification, and regression depend on the model learning appropriate graph representations. From a supervisor's vantage point, graph learn- ing activities are broken up into diverse learning settings. Training in a supervised setting allows access to labeled data, At the same time, a semi-supervised design provides many untagged nodes and a small number of tagged nodes for the training procedure. In an unsupervised setting, the ‘model can only learn to recognize patterns in data that have not been labeled 5.1.4 BUILD MODEL USING COMPUTATIONAL MODULES, Using the regular computational modules, we start building the model as follows: a) Propagation Module: This module spreads data between nodes so that feature and structural data can be added to the aggregated data, Convolution and recurrent operators are widely employed in this module to obtain neighbour information. Furthermore, the skip connection is often used to get information from previous node illustrations and deal with the issue of over-smoothing. b) Sampling Module: When graphs are large, sampling modules are typically necessary for graph propagation. Combining the sampling and propagation modules is common. ©) Pooling Module: Pooling modules collect data for representation when highly ranked sub- ‘graphs or graphs are required. 316 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS Graph-based learning systems can demonstrate different degrees of computational complexity depending on the specific algorithms and approaches utilized. Random walk-based methods, graph convolutional networks, graph attention networks, and graph neural networks are widely acknowl- edged methodologies for gaining insights into graph structures. The complexity of these ap- proaches can be adjusted by several methods, depending on criteria stich as the model’s scale, the ‘number of parameters, and the graph’s size. The computational cost of graph representation learn ing methods can exhibit significant variability, contingent upon the particular approach employed and the characteristics of the graph dataset, including its size and complexity. Specific techniques may show high computational complexity and require substantial memory resources, whereas oth- ers may be more streamlined and adaptable. GCNs (Kipf & Welling, 2016a) are a type of neural network architecture specifically designed to operate on graph-structured data. The forward pass in GCNs entails the aggregation of information from adjacent nodes and the subsequent application of a neural network layer. The computational complexity of a singular layer is commonly expressed as O|V| + |E|, where |V | denotes the cardinality of the set of nodes and |E|, represents the cardi- nality of the set of edges within the graph. The utilization of multiple layers in GCNS enables the learning of hierarchical representations. The computational complexity of stacking L layers is de- noted as O(L + (IV| + |E[)))). In order to operate effectively, GCNs necessitate the retention of three key components: the adjacency matrix or an edge list, node features, and model parameters, which include weights and biases. The memory demand is contingent upon the dimensions of the graph and the quantity of model parameters. The attention mechanism is employed by GATs (Velickovic et al, 2017) to facilitate information aggregation by assessing the relative significance of neighbouring nodes. The formula O(d + |V|?) is commonly used to express the computational complexity of the attention mechanism. Tn this notation, d denotes the number of attention heads, while [V| indicates the number of nodes. In order to enhance their overall effectiveness, GATS commonly employ several attention heads, The overall complexity is augmented to O(H « d IV/?), where H represents the total number of attention heads. Similar to GCNs, GATs require a designated storage space for the adjacency matrix or edge list, as well as the node attributes and model parameters. The required memory capacity is contingent upon both the dimensions of the graph and the quantity of model parameters. Sampling is a crucial aspect of Graph Sample and Aggregated (GraphSAGE), as it enables the efficient processing of large graphs. GraphSAGE achieves this by performing its operations on subgraphs that have been carefully selected through the sampling process. The computational com- plexity is directly proportional to both the size of the sampled subgraphs and the number of layers integrated into the model. Its algoritm employs a diverse set of aggregation functions, including ‘mean, max, and Long Short-Term Memory (LSTM), to collect and integrate information from neighbouring nodes at each layer. The aforementioned data is subsequently utilized in the subse- quent stratum. The computational complexity of aggregation is contingent upon the specific func- tion employed and the magnitude of the neighbourhood (W. Hamilton et al., 2017). The manage- ‘ment of large-scale graph datasets presents several challenges, encompassing constraints related to ‘memory capacity, processing time limitations, and issues pertaining to scalability. As the size of the graph increases, the computational requirements in terms of time and memory can experience a significant and rapid escalation, Consequently, conventional methods become inefficient or in- feasible to execute. In order to tackle the computational difficulties associated with graph datasets of significant scale, scholars have put forth a range of optimization methodologies. 317 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI ‘Sampling Techniques: Researchers employ sampling approaches to extract subsets of the graph that are representative of the complete chart, circumventing the need to process the entire network. Utilizing these methodologies can significantly reduce the computational load while maintaining the integrity of the graph’s overall structure, Sampling techniques are used to select a subset of nodes and edges from a large chart to reduce computational load while preserving the overall de- sign and properties of the network. In research, many sampling methods are frequently employed, such as node, edge, and neighbourhood(X. Liu, Yan, et al., 2021; H. Zeng et al, 2019). Parallelization: is a technique that can yield advantages for graph algorithms, especially when ap- plied to modem computer architectures like GPUs and TPUs. Parallelizing graph operations can offer advantages such as enhanced computational speed and increased efficiency. Parallelization involves subdividing graph computations into smaller tasks that can be executed simultaneously on multiple processing units, including graphics processing units (GPUs) and central processing units (CPUs). This method facilitates accelerated computations and enhances efficiency (Chiang et al., 2019; D. Zhang et al., 2020), Approximation methods can be utilized in certain situations to obtain approximate solutions and minimize computational requirements. These estimations may be deemed sufficient for specific applications, particularly when dealing with large-scale graphs. Approximation methods possess the capability to provide approximate solutions while concurrently minimizing the computational resources required, rendering them well-suited for application in scenarios involving large-scale graphs. In pursuit of enhanced computational efficiency, these strategies sacrifice accuracy. ‘The concept of “sparsity exploitation” pertains to the utilization of the observation that the majority of graphs exhibit a relatively low density of edges relative to the total umber of potential connec- tions. The utilization of sparsity has the potential to yield significant reductions in computational requirements. The utilization of distributed frameworks in computing is another method. These frameworks are employed to handle exceedingly large graphs that surpass the storage capacity of a single machine’s memory. This is achieved by distributing the graph processing tasks across mul- tiple nodes within a cluster (Weber et al., 2018) (Abu-EI-Haija et al., 2020). ‘In summary, the computational complexity of graph-based learning methods exhibits significant variability, and the analysis of extensive graph datasets requires meticulous deliberation of optimi- zation strategies, parallelization techniques, approximation methodologies, and the integration of dedicated hardware accelerators. Academic researchers consistently seek novel methodologies to surmount these challenges and render graph-based learning feasible and adaptable to practical ap- plications (C. Li et al, 2023) (Lee et al, 2023). 6. Applications Unlike traditional neural networks, which function with arrays, GNNs can operate on graphs. Graphs’ rising popularity can be attributed to their ability to depict complex situations in the actual ‘world, Information within the applications is structured. Social networks, chemical structures, web connection data, and other unstructured data are studied by modeling them as graphs. Different sorts of unstructured data and testing rely on this information.-Although node classification, graph classification, graph generation, network embedding, and spatial-temporal graph prediction are all examples of GNNS, they aim to do slightly different things. This study highlights several research- based applications in the following fields: 318 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS Figure 6: Scene Graph Image Representation. Computer Vision (CV): In computer vision, GNN activities include the generation of scene graphs, the classification of point clouds, and the detection of actions. Understanding the context in which things are placed helps in visual scene interpretation. Models that generate scene graphs aim to analyze a scene by graphing components and their semantic relationships (Dhingra et a., 2021). For semi-supervised image classification, where both labeled and unlabeled image occur- rences are used, the graph neural network simulates the fine-grained region correlations and in- creases classification performance (Hong et al., 2020; Luo et al., 2016; Satorras & Estrach, 2018) Captioning images was demonstrated by (X. Yang et al., 2019) using a novel Scene Graph Auto- Encoder (SGAE). There are two stages to this particular captioning workflow: First, by employing ‘a Convolutional Neural Network (GCN) to encode the image’s scene graph, then decoding the phrase using the recording illustration; second, by including the image’s scene graph into the cap- tioning model. Similarly, (L. Li et al., 2019) present the Relation-aware Graph Attention Network (ReGAT), an innovative framework for Visual Question Answering (VQA) to describe multi-type object rela- tions with a query adaptive attention mechanism. To use graph data in images and text, the authors (2. Yu, Lu, et al, 2018) introduce a novel cross-modal retrieval model they call a dual-path neural network with a graph convolutional network. It considers regular vector-structured visual illustra~ tions and irregular graph-structured textual representations to learn linked features concurrently and shared latent semantic space, Furthermore, (S. Wang ef al., 2020) generate the text scene and visual scene graphs by mining the image and text for objects and associations (Figure 6). Ulti- mately, they developed a model called Scene Graph Matching (SGM). This model uses two spe- cialized graph encoders to transform the raw data from the visual and text scene graphs into a feature graph. After gaining knowledge of the properties present at both the object and connection levels in each graph, it will be possible to match the two feature graphs that correspond to the two modalities at two levels with greater success. Brain Networks: Centralized activities that reveal a region’s importance in a network (Page et al, 1999), together with anatomical and functional connectivity (Fomito et al., 2013), is now among, the most studied aspects of the brain. Differences in sex and age (Zuo et al., 2012), mania and depression (Deng et al., 2019), blindness and vision loss (Q. Lin et al., 2021), diabetes and neurop- athy (Q-H. Xu et al, 2020), and genetics and epigenetics (Wink et al., 2018) have all been shown by analyzing functional connectivity centrality 319 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI Figure 7: Biomedical Application, Recommendation System: In social networks and advertising platforms, recommendations play a crucial role (W. L. Hamilton et al., 2017; Mao et al., 2021; Z. Wang et al., 2020; Z. Zhao et al., 2020). Some networks include spatial and temporal data (Kermarrec et al., 2011) and structure, content, and label data, In mobile applications, spatial-temporal embedding (C. Zhang et al., 2017) is a developing field. GNNs are a valuable tool to employ with user and item relational character- istics. KGNN-LS (H. Wang et al.,2019) improves the item representation in a knowledge graph by carrying out aggregations among its associated neighbourhood, Another assumption made by KGNN-LS (H. Wang et al, 2019) is that related objects in the knowledge graph would likely have similar user preferences. It adds the regularization term to acquire such a customized weighted knowledge network. KGCN and KGAT (X. Wang et al., 2019) generally have similar concepts. An auxiliary loss for knowledge graph reconstruction is the only significant difference, Biomedical Application: Biomedical data analysis can make use of graph representation learning, For instance, brain network data can be represented as a graph, with the signals from the brain acting as the nodes (D. Zhang et al., 2018)(B. Li & Pi, 2020). The brain’s structures and functions in response to various inputs can be studied using embedding techniques (Figure 7). For the study of Alzheimer’s disease (C. Hu et al., 2015) (Si et al, 2019) and the brain’s response to magne- toencephalography signals (R. Liu et al., 2018), various frameworks have been put forth. Drug repurposing algorithms have been wilized with Electronic Health Record (FHR) data (Hurle et al, 2013) (Pushpakom et al., 2019). For instance, (Gurwitz, 2020) used EHR using data to modify drugs to treat COVID-19, (Y, Wu et al., 2019) identified several non-cancer drugs as modifiable options for cancer treatment. Based on the omics (proteomics and transcriptomics) signature data of bladder cancer patients, (Mokow et al., 2020) developed a drug repurposing pipeline. Natural Language Processing (NLP): The text classification problem is somewhat old in the field of NLP. Documents can serve as nodes in the citation network, with references between them serv- ing as edges. The bag-of-words approach is commonly employed when describing the attributes of nodes in a citation network. When dividing articles into distinct sets, node classification is the quickest and easiest option, Numerous Graph convolutional networks have been explored, to name 320 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS a few: (W, Hamilton et al,, 2017; Kipf & Welling, 2016a; Levie et al., 2018; Monti et al, 2017; Zhwang & Ma, 2018). On the other hand, you may use graph classification to sort the texts and analyze the documents graphically (Le., each document is modeled as a graph) (Defferrard et al., 2016). TextGCN (H. Peng et al., 2018) also models the entire corpus to a heterogeneous graph and simultaneously learns word and document embeddings before classifying the text using a SoftMax classifier, Using a graph pooling layer and hybrid convolutions comprising graph convolution and classic convolt- tion, Gao et al. (H. Gao et al., 2019) use node ordering information to perform better than traditional CNN-based ones GCN-based techniques. As the number of labels increases, especially if they are all at various degrees of topical granularity, the effectiveness of these methods may decrease. Using a graph-of-words to represent long-distance semantics, (H. Peng et al., 2018) apply a recursive regularized graph convolution model to use the pyramid of labels. The foundation of many NLP- related applications is information extraction, and graph convolutional networks have been exten- sively used in this and its related challenges. For instance, GraphIE (Qian et al., 2018) discovers non-local relationships between textual units, generates local context-aware latent reconstructions of words or sentences, and utilizes a decoder to label words at the word level. GraphlE is compatible with information extraction methods like named entity extraction. Word-relationship extraction (Y. Zhang et al, 2018) and event-extraction (S. Cui et al., 2020; X. Liu et al., 2018; Nguyen & Grishman, 2018) are two applications of convolutional graph networks, Furthermore, Mar- cheggiani et al. create a syntactic graph convolutional network model that can be applied to syn- tactic dependent trees and is appropriate for numerous NLP applications, including neural machine translation (Bastings et al., 2017) and semantic role labeling (Marcheggiani & Titov, 2017). Graph convolutional networks can give phrase encoders a semantic bias and enhance performance in se~ ‘mantic machine translation (Marcheggiani et al, 2018). ‘Traffic: For an intelligent transportation system to function, it must accurately predict trafic den- sity, road capacity, or speed in traffic systems. Several researchers use spatial-temporal GNNs (STGNNs) to develop models that can deal with a wide range of traffic network problems; these include (Y, Li etal, 2017; B. Yu etal., 2017; J. Zhang et al, 2018). The sensors installed along the roads are viewed as nodes in a spatial-temporal graph, with the distances between each pair of sensors representing edges. The average traffic speed at each node throughout a given frame is provided as a dynamic input element, whereas (Agafonov & Myasnikov, 2021; Bing et al., 2020; Bogaerts et al., 2020: L. Cai et al., 2020; L. Zhao et al., 2022) focused on solving the problem of Road traffic speed. (X. Fang et al., 2020; H. He et al, 2021; James, 2021; F. Li et al, 2021; Y. Zhang, Li, et al., 2021) looked at the angle of predicting road travel time, whiles (Bai et al., 2021; J. Chen, Liao, Hou, et al., 2020; X. Chen, Zhang, Du, et al., 2020; S. Fang et al., 2020; Ge et al, 2020; Jiang & Luo, 2022; Yin et a., 2021) also focused on solving and predicting road traffic flow. 7. The Limitations of Using GNNs and Some of the Possible Solutions ‘The over-smoothing, scalability, and expressive ability of GNNs are only a few of their well-doc- ‘umented shortcomings. This paper provides an overview of the most important articles that deal with these topics. 7.1 Over-Smoothing Ik has been established that the depth limitation of GNNs is a significant issue (Q. Li et al 2018)(Oono & Suzuki, 2019), The first layer considers a node’s immediate neighbours when using 321 (CHIRWENDU, NIAOLING, AGYEMANG, ADIEL-MENSAH, UKWUOMA & EIYI “Authors ——=S=S~S*S*S*«é posed Solution (Dong etal 2021) ARGS (GSlepera etal, 2018) APPNP (K. Zhou eta, 2020) DGN-GNN (M4 Li etal, 2020) DAGNN (Boag etal, 2019) DropEdege (Kip & Welling, 20162) cn (M. Chen, Wei, Huang, et a, cent 2020) (Basanzadeh etal, 2020) ope (D He eral, 2022 (GCN=infation (& Xu, Li etal 2018) IkNet-Concat (Chamberlain et al, 2021) ORAND-PDE liasof et al, 2021) PDE-GCN (G4 Zeng etal, 2021) & Wuetal, 2019) SHADOW-SAGE sac “Table 6: Some proposed over noothing techniques. “Authors ‘Algorithon ‘Solution Summary TAamian © Lalaze, 2020) (Bolla et a 2 1) (2. Chea eta 2019) (Dasoulas et al, 2019) (2 FRoang et al, 20: @.Lietal,2020) (Maron etl, 2019) (Mons eta (stomp eta. 2019) 2019) (app et al, 2021) (Sato etal, 2021) Wang etal. ) inghe & Wang, (Wiest 2021) (M4, Zhang & Li, 2021) FON DEGNN PPGN kGnn RP-GNN Drop sGIN PEG Graphs NGNN ‘Adding matsix multiplication tothe model Using a broad seveptive field aud spectral douiin desi fo create the cou- volution. Adding and multiplying by means of a ring of matuices Uiilizng colors to separate related node properties ‘Using a pemutation-aware aggregate to eaprure the cortelation between nearby uodes. Tncorporatng a distance-based addtional node functionality “Taking higher-order message forwarding into aceount ‘Using subgraph structures for message pasing rather than nodes Incinding a special node abel Randomly moving some of the nodes Ackling random features to GIN ‘Updating the node and positional features over distinct channels Facil tng information transfer by incorporating additional framework. Rather than encoding each node asa rooted subtree, ¢ rooted subgraph is sed, Table 7: A sunmavization of significant solutions suggested to enhance GNN expressive power 322 A COMPREHENSIVE SURVEY ON DEEP GRAPH REPRESENTATION LEARNING METHODS GNN methods. A node's second layer travels to its two-hop neighbours as a sub-layer, and succes- sive layers are layered similarly in the node's neighbours. The resulting vectors will be over- smoothed since local information for each vertex is lost after several layers. To address the problem of over-smoothing, Graph Random Neural Network (GRAND) proposes a novel architecture. Drop Node is used in this method to expand the feature matrix of the input graph similarly to the Drop Edge process (Rong et al., 2019). This improvement helps lessen a node’s reliance on other neighbouring nodes. Then, to reduce the likelihood of over-smootiing, it combines neighbours of a node up to K hops away. Consistency regularization is used in the model’s training process (Berthelot et al., 2019) to lessen the problem of overfitting in the semi-supervised environment ‘when dealing with scarce labels, Theoretically, it has been demonstrated by (M. Chen, Wei, Huang, et al., 2020) that over-smoothing may be defeated by adding two straightforward approaches to GCN at each layer. An initial linking to the input features must be established to guarantee that some node features are included in the final node representation, To guarantee at least the same efficiency level as a shallow GCN, the identity matrix is recommended to be included in the weight matrix. As shown in Table 6, many approaches have been taken to overcome the over-smoothing problem in GNN. 7.2 Scalability In addition, GNNs have difficulty scaling since the embeddings of a node are constructed by com- bining the representations of that node’s neighbours. In particular, the temporal complexity of neighbour aggregation for a GNN with many layers is significant. In huge graphs, there may be a Jot of neighbours, which slows down GNN training and uses more memory. This issue is resolved by (C. Zhuang & Ma, 2018) and (Chiang et al., 2019; Gomez et al., 2017)by picking a portion of node neighbours. A node’s neighbours are sampled separately at each tier (J. Chen et al., 2018). By using clustering algorithms to sample a subgraph one per batch and a graph convolution filter on the subgraph’s nodes, Cluster-GCN (Chiang et al., 2019) lessens the memory issue associated with GCN. Specific techniques like SGC (F. Wu et al,, 2019) eliminate the non-linear activation function to shorten training time, By using reversible connections (Gomez et al, 2017), RevGNN (G. Liet al, 2021) lessens the memory footprint of GNNs as a function of layer count. The feature ‘matrix is separated into many categories as inputs to this model. The most recent input’s propaga- tion method output is stored in memory, an advantage of this method. The scalability of GNNs has also been examined in several other works, including (Bojchevski et al., 2020; M. Chen, Wei, Ding, et al, 2020; M. Ding et al. 2021; Fey et al., 2021; Z. Huang et al., 2021; Yoon et al., 2021), 7.3 Expressive Ability ‘The ability of a model to distinguish between various graphs is referred to as expressive power, To rephrase, the model may assign equivalent graphs to the same embedding and dissimilar graphs to different embeddings. The Weisfeiler & Lehman (WL) test limits the expressive power of popular GNNS like GCN (Kipf & Welling, 2016a) and GraphSAGE (W, Hamilton et al., 2017), and they are unable to recognize some non-isomorphic substructures. Graph Isomorphism Network (GIN), ‘a more potent GNN-based embedding technique, is suggested in (K. Xu, Hu, et al., 2018). GIN aggregates a node’s neighbours using the stm operator rather than the mean or max operators. The expressive capability of GIN is also theoretically demonstrated to be equivalent to that of the WL test. A coloring mechanism is suggested by Identity-aware Graph Neural Networks (ID-GNN) (J. 323

You might also like