0% found this document useful (0 votes)
3 views29 pages

Understanding Graph Databases - A Comprehensive Introduction

Uploaded by

Cyrus Ray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views29 pages

Understanding Graph Databases - A Comprehensive Introduction

Uploaded by

Cyrus Ray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

1

Understanding Graph Databases: A Comprehensive


Tutorial and Survey
Sydney Anuyah1 , Emmanuel Bolade2 , Oluwatosin Agbaakin1 ,
1
Luddy School of Informatics, Computing and Engineering, Indiana University, Indianapolis, IN, USA
2
Data Science Department, Edyah Consulting, Anthony, Lagos, Nigeria
Emails: [email protected], [email protected], [email protected]
arXiv:2411.09999v1 [cs.DB] 15 Nov 2024

A BSTRACT brain studies, transportation networks, etc.In geosciences,


This tutorial is curated as a one-stop shop for understanding [112] nodes can represent geographic locations, while edges
graph databases, as it emphasizes the foundations of graph the- capture physical or spatial relationships between locations;
ory and explores practical applications across multiple fields. For networked systems such as transportation systems, [107]
The paper begins with foundational concepts, and further nodes can represent cities or transport hubs, while the edges
explains the structure of graphs through nodes and edges, represents the routes or pipelines connecting them, etc. Such
including various types such as undirected, directed, weighted, graph representations provide a comprehensive view of the
and unweighted graphs. The tutorial outlines essential graph structure of a network, allowing exploration of connections
properties, terminologies, and key algorithms for network and dependencies [107], [112].
analysis, including Dijkstra’s shortest path algorithm and tech- In biomedical research, graphs can be used to model brain
niques for calculating node centrality and graph connectivity. connectivity, as [37] and [52] illustrated in their studies of pe-
Graph databases, as discussed in the tutorial, offer advan- diatric epilepsy and brain connectivity. Nodes here represented
tages over traditional relational databases by enabling efficient the distinct brain regions and the edges indicated functional
management of complex, interconnected data. Overall, this connections observed through EEG data. The grpah structure
paper discusses prominent graph database systems like Neo4j, aided researchers to analyze the influence of specific brain
Amazon Neptune, and ArangoDB, each with unique features areas on cognitive functions. Graph analysis in these cases
for handling large datasets. There are practical instructions on ususally enables the identification of key nodes critical to
implementing graph operations using NetworkX and Neo4j neural activity [37], [52]
cover node and edge creation, attribute assignment, and ad- 2) Types of Graphs: Undirected, Directed, Weighted, Un-
vanced queries with Cypher. Furthermore, the paper includes weighted: Graphs come in several types based on the prop-
common graph visualization techniques using tools such as erties of their edges, including undirected, directed, weighted,
Plotly and Neo4j Bloom which enhance the usability of graph and unweighted graphs. Undirected graphs have edges with
data, while also focusing on community detection algorithms, no particular direction, making them suitable for representing
such as the Louvain method, which support clustering in mutual relationships like social connections, as [110] illustrate
large networks. We conclude by providing future directions to in their study of social networks. Here, friendship or mutual
researchers who are looking to enter into the world of graphs. interactions are best represented through undirected graphs
where both nodes are equally connected [110]. In contrast,
directed graphs involve edges with a defined direction, as
discussed by [107] in the context of supply chain networks,
Keywords: Graph databases, Neo4j, NetworkX, graph theory, where goods flow sequentially from one location to another,
social network analysis, graph visualization, data modeling, requiring a directed approach.
graph algorithms, Cypher queries, community detection, graph Graphs may also be weighted, where edges are assigned
neural networks, database management systems. weights that reflect the strength or cost of the connection.
In transportation networks, weights can represent distances
I. I NTRODUCTION
or time costs, optimizing the routing and minimizing travel
A. Understanding Graph Theory expenses [107]. Unweighted graphs, on the other hand, treat
1) What are Graphs? Nodes and Edges: The simplest all connections equally and are commonly used when only the
definition of a graph is a diagram with interconnected point. existence of a relationship is relevant, as seen in brain con-
At least, that was how it was presented in mathematics. In nectivity research, where edges merely indicate connectivity
this context, graphs are defined as abstract data structures without weighing them [52]. Understanding which graph type
consisting of nodes (or vertices) connected by edges. Graphs to apply depends on the requirements of the application, as
are now are widely used to represent relationships between improper usage can lead to misinterpretation of the data [46].
entities, and they are the go-to structure for training large lan- 3) Basic Terminology and Notations: Graph theory uses
guage models[138]. A graph is therefore an ideal way to model specific terminologies and notations that form its foundation.
networks in various fields. These fields include geosciences, Concepts like degree, path, and adjacency matrix are crucial
Understanding Graph Databases: A Comprehensive Tutorial and Survey

to analyze connectivity within graphs. The degree of a node, and items are represented as nodes, with edges connect-
which indicates the number of edges connected to it, is ing users to the items they have interacted with or rated.
essential in spatial network studies. [112] explain that degree Recent advancements in GNNs have significantly enhanced
is used in geosciences to measure a node’s influence within the efficacy of recommender systems by utilizing the high-
a spatial network, assisting in identifying regions of high order connectivity inherent in graph data [134]. These models
connectivity and impact. Similarly, in studies on graph com- consider not only direct connections but also the broader
plexity, nodes with high degrees contribute significantly to a network structure, capturing indirect relationships that im-
network’s structural intricacy, as described by [65], particularly prove recommendation accuracy [40]. This approach enables
in biological and social systems where interconnectedness can platforms like e-commerce sites and streaming services to
signify greater complexity [65]; [112]. provide tailored suggestions, reducing information overload
Other key terms, such as paths and adjacency matrices, for users and improving their overall experience. Graph-based
aid in computational modeling. Paths represent routes or se- recommendation techniques have become essential in handling
quences of edges connecting nodes, which are fundamental in the vast data of modern recommender systems, making them
transportation systems where shortest path algorithms, like Di- more relevant and efficient.
jkstra’s and Bellman-Ford, are used to optimize network flows 4) Biological Networks: In biological research, graph the-
[107]. An adjacency matrix, a square matrix that represents ory is instrumental in analyzing the interactions within com-
graph connections, simplifies visualization and manipulation plex biological systems, such as protein-protein interactions
of complex networks, as highlighted in EEG-based brain or gene regulatory networks. Nodes represent bioentities like
network analysis for exploring connectivity patterns. [37]. proteins or genes, while edges denote the interactions between
them, enabling insights into biological processes and disease
mechanisms [69]. The increase in biological data has led
B. Real-World Applications of Graphs
to the application of deep learning techniques, especially
1) Social Networks: Social networks are among the most GNNs, to handle and analyze complex biological networks.
common applications of graph theory, allowing for the mod- GNNs are particularly useful in predicting protein functions,
eling and analysis of complex relationships between users. drug discovery, and understanding genetic interactions, where
Graph theory enables social network analysis (SNA) through they provide a high level of precision in detecting patterns
metrics that evaluate network structure, user influence, and within heterogeneous biological data [98], [58]. By applying
connectivity [86]. For example, link prediction algorithms graph theory to biological networks, researchers can better
anticipate potential connections, suggesting friends or follow- understand disease pathways and identify potential therapeutic
ers based on existing network patterns [27]. These models targets, contributing significantly to advancements in bioinfor-
capture essential social interactions and enable applications matics and medicine.
in community detection, anomaly detection, and influence As graph theory continues to evolve, its methodologies
analysis. Furthermore, recent advancements in graph neural allow researchers to tackle increasingly complex, data-rich
networks (GNNs) have enhanced the scalability and precision environments, as evidenced by its wide adoption in fields such
of social network analysis, enabling more sophisticated user as neuroscience, urban planning, and social science research
recommendations and deeper insights into network dynam- [112], [128]. Graph theory’s capacity to model both tangible
ics [17], [9]. Through these methodologies, social networks and abstract relationships has made it essential for analyzing
leverage graph theory to foster user engagement, enhance connectivity and interactions in modern data science.
information dissemination, and improve security.
2) Transportation Networks: Graph theory also plays a
crucial role in transportation networks by modeling routes and C. Graph Databases
optimizing logistics. Each location, such as a city or transit Graph databases have emerged as specialized systems de-
hub, is represented as a node, while the connections between signed to handle data with complex, interconnected relation-
these locations (e.g., roads or railways) serve as edges. Shortest ships. Unlike traditional relational databases that store data in
path algorithms like Dijkstra’s and Bellman-Ford are funda- tables, graph databases represent data as nodes (entities) and
mental in reducing travel time and costs across transportation edges (relationships), making them highly effective for ap-
networks [107]. Additionally, graph neural networks (GNNs) plications where connections between data points are central.
have recently been applied to traffic forecasting, leveraging These databases support advanced analytics, pathfinding, and
data such as road congestion and passenger flow to predict pattern matching, and are instrumental in fields such as social
traffic patterns accurately [56]. These predictions allow for networking, recommendation systems, and fraud detection
real-time adjustments in transportation services, benefiting [16], [54]. As large datasets become increasingly common,
urban planners and logistics companies. The application of graph databases offer efficient solutions for managing and
GNNs to transportation networks has revolutionized traffic querying complex data structures with minimal latency and
management, enhancing efficiency and improving commuter high scalability.
experiences. 1) Differences between Graph Databases and Rela-
3) Recommender Systems: In recommender systems, tional Databases: Relational databases (RDBMS) and graph
graphs enable the modeling of user-item interactions, form- databases differ significantly in terms of data structure, storage,
ing the foundation for personalized recommendations. Users and query mechanisms. Relational databases organize data in

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

tables, enforcing data integrity through primary and foreign specific ontologies, making it especially popular in knowledge
keys. While effective for structured, transaction-based data, graph development and metadata management [25].
RDBMS struggle with queries involving multi-level relation- One limitation of Neptune, however, is its dependency on
ships, as JOIN operations become costly with complex queries the AWS ecosystem, which may restrict flexibility for users
[30], [54]. Graph databases, in contrast, use nodes to represent who rely on multi-cloud strategies. Nonetheless, for enter-
entities and edges to capture relationships, creating a natural prises already invested in AWS, Neptune offers a powerful,
and efficient structure for traversing relationships without scalable, and secure solution for managing complex data
JOINs [109]. This makes graph databases particularly suitable relationships in real-time [10].
for applications requiring deep relationship exploration, such 4) Graph Databases: ArangoDB: ArangoDB is a unique
as recommendation engines and social networks [16], [8]. graph database known for its multi-model capabilities, sup-
2) Graph Databases: Neo4j: Neo4j is one of the most porting document, key-value, and graph data models within
widely used graph databases, recognized for its robust prop- a single database. This flexibility allows users to handle
erty graph model and flexibility in managing and traversing diverse data types and structures, which is beneficial for
complex relationships. Neo4j organizes data using a labeled applications with complex and varied data needs. ArangoDB
property graph model, where both nodes and edges can have uses the Arango Query Language (AQL), which is designed to
attributes stored as key-value pairs. This model allows Neo4j work across different data models, providing a unified query
to efficiently perform graph traversals, a crucial feature for language that simplifies complex queries involving multiple
applications like fraud detection, social network analysis, and data types [14]. This versatility makes ArangoDB suitable for
recommendation engines [105], [16]. Additionally, Neo4j uses big data applications, as it can manage different data structures
the Cypher query language, specifically designed for querying without needing multiple databases.
graph structures. Cypher’s syntax is intuitive, making it acces- ArangoDB’s multi-model architecture allows it to support
sible for both developers and data scientists, and allows for both OLTP (Online Transaction Processing) and OLAP (On-
complex pattern matching and relationship-based queries that line Analytical Processing) queries, enabling it to handle both
are challenging in relational databases [8]. Neo4j is also known transactional and analytical workloads. For instance, in e-
for its high level of consistency and support for ACID (Atom- commerce, ArangoDB can support a real-time recommenda-
icity, Consistency, Isolation, Durability) transactions, which tion system using its graph model while managing inventory
are essential for data reliability. As Neo4j supports both trans- data with its document model. This multi-functional capability
actional and analytical operations, it is particularly favorable is particularly useful in use cases like data integration and ETL
for applications requiring real-time analytics. For example, (Extract, Transform, Load) processes, where data from various
in social networks, Neo4j can help identify influential nodes sources need to be combined and processed efficiently [90].
(individuals) and track interactions across a network, providing Additionally, ArangoDB’s support for graph-based analytics
insights into social dynamics [86]. Neo4j’s scalability is also enables it to handle complex tasks such as fraud detection,
noteworthy; it can be configured in a clustered environment network analysis, and predictive modeling.
for horizontal scaling, making it suitable for handling massive ArangoDB also offers excellent scalability, supporting hori-
datasets commonly seen in enterprise applications [54]. zontal scaling through sharding and distributed cluster setups.
3) Graph Databases: Amazon Neptune: Amazon Neptune Its integration with Kubernetes enhances its adaptability in
is a managed graph database service by AWS, supporting cloud environments, providing flexibility in deployment and
both RDF and Property Graph models. Its compatibility with management [14]. This makes it a viable option for organiza-
both SPARQL and Gremlin query languages allows users to tions handling large datasets with complex requirements, such
choose between querying RDF-based semantic data or tradi- as those in finance, healthcare, and retail industries.
tional graph data, making it versatile for diverse applications. 5) Benefits of Using Graph Databases: Graph databases
Neptune’s architecture is designed for high availability and provide several advantages over traditional relational
fault tolerance, with automated backups and replication across databases, especially when managing complex, interconnected
multiple availability Zones, ensuring minimal downtime and data. One primary benefit is their ability to efficiently model
data reliability [10]. One of Neptune’s key features is its seam- relationships directly in the data structure, which eliminates
less integration with other AWS services, which simplifies data the need for complex JOIN operations. This structure is
ingestion, processing, and analytics on the AWS cloud [10]. especially advantageous in applications requiring deep
Neptune is particularly effective for applications that require relationship analysis, such as social networks, where it is
fast query response times and efficient handling of large- essential to traverse and analyze complex user interactions
scale, dynamic data. Its use cases include recommendation rapidly [86], [68], [109]. Graph databases also offer schema
engines, fraud detection, and knowledge graph applications, flexibility, allowing for dynamic adjustments to data structures
where rapid data retrieval and complex relationship analysis without the need for costly schema migrations, a valuable
are essential [16]. For instance, in a recommendation system, feature for evolving datasets [54]
Neptune can process high-order connections by analyzing the Another significant benefit of graph databases is their scal-
relationships between users, products, and behaviors, resulting ability. Databases like Amazon Neptune and ArangoDB are
in highly personalized recommendations. Additionally, Nep- designed to handle large datasets while maintaining query effi-
tune’s RDF support makes it valuable for semantic data pro- ciency, making them suitable for enterprise-level applications.
cessing, allowing organizations to model and query domain- This scalability is critical in big data environments, where

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

traditional relational databases may struggle with performance


issues as data volume and complexity increase [10], [14]. Ad-
ditionally, graph databases support advanced analytics, such as
community detection, link prediction, and centrality measures,
which are essential for applications in areas like recommenda-
tion systems, fraud detection, and biological network analysis
[25], [98].
II. C REATING AND V ISUALIZING BASIC G RAPHS
In this section, we’ll discuss how to create basic graphs
using NetworkX in Python and Neo4j, focusing on creating
nodes and edges, assigning attributes, and loading and saving
data in Neo4j. These operations form the foundation for
building and visualizing more complex graph structures.

A. Creating Nodes and Edges


Creating nodes and edges is the foundational step in build-
ing a graph. Nodes represent entities, and edges represent the
relationships or interactions between these entities.
In NetworkX: NetworkX is a Python library that simplifies Fig. 1. Neo4j visualization of nodes and edges
the creation and manipulation of graph structures. Nodes and
edges can be added directly using simple commands.
Creating Nodes: You can create individual nodes or add Edges
multiple nodes at once.
Here is how to create nodes in Python Edges in the graph: [(’Alice’, ’Bob’), (’Alice’, ’Char-
lie’), (’Bob’, ’David’), (’Charlie’, ’Eve’)]
import networkx as nx

G = nx.Graph() # Initializes an undirected


graph

# Adding nodes individually The resulting graph visualization is shown in Figure 2.


G.add_node("Alice")
G.add_node("Bob")

# Adding multiple nodes


G.add_nodes_from(["Charlie", "David", "Eve"])
Listing 1. Creating Nodes in Python with NetworkX

# View the nodes in the graph


print("Nodes in the graph:", G.nodes())
Listing 2. Viewing created Nodes in Python

Nodes
Nodes in the graph: [’Alice’, ’Bob’, ’Charlie’, ’David’,
’Eve’]

Creating Edges: Edges can be added similarly, either one


at a time or in groups. An edge defines a connection between
Fig. 2. Graph visualization of edges created in Python with NetworkX
two nodes.
# Adding an edge between two nodes In Neo4j, nodes and relationships (edges) are created using
G.add_edge("Alice", "Bob")
the Cypher query language, which is designed for handling
# Adding multiple edges graph data. Neo4j allows you to specify types for both nodes
G.add_edges_from([("Alice", "Charlie"), ("Bob" and relationships, making it suitable for more structured and
, "David"), ("Charlie", "Eve")]) complex data.
#Print out the edges for a view CREATE (a:Person {name: ’Alice’})
print("Edges in the graph:", G.edges()) CREATE (b:Person {name: ’Bob’})

Listing 3. Creating Edges in Python with NetworkX Listing 4. Creating Nodes with Neo4j

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

In the commands above, the labels Person identify the


nodes as representing people, and each node has a property
name.

Creating Relationships (Edges) in Neo4j


MATCH (a:Person {name: ’Alice’}), (b:Person {
name: ’Bob’})
CREATE (a)-[:FRIEND]->(b)

Listing 5. Creating Edges with Neo4j

This command creates a FRIEND relationship between


Alice and Bob, indicating a connection in the context of a
social network [54], [105].

Assigning Attributes to Nodes and Edges Fig. 3. Graph visualization of nodes with attributes in NetworkX
Attributes can provide additional information about nodes and
edges, such as a person’s age or a relationship’s strength.
In NetworkX: Attributes can be assigned when nodes or
edges are created or added later. The visualization of nodes with their attributes is shown in
Figure 3.
# Adding a node with attributes
G.add_node("Alice", age=30, city="New York") Note: In NetworkX, you can dynamically add or modify
both nodes and attributes for every node and edge even if they
# Adding attributes to an existing node were not predefined initially like the relationship ”weight” and
G.nodes["Bob"]["age"] = 25 the node ”Faith” below.
G.nodes["Bob"]["city"] = "Los Angeles"

Listing 6. Assigning Attributes to Nodes in NetworkX

Viewing created Node-Attributes in Python


# Adding an attribute and defining the weight
#To view the attributes of a specific node we class + Defining a new node
use G.add_edge("Alice", "Faith", relationship="
print("The attributes for the node Bob are: ", friends", weight=4)
G.nodes["Bob"])
#To view the attributes of multiple nodes we # Adding an attribute to an existing edge
use G.edges["Alice", "Charlie"]["weight"] = 3
print("The attributes for the node Bob and
Alice are: ", {node: G.nodes[node] for
Listing 8. Assigning Attributes to Edges
node in ["Bob", "Alice"]})
#To view the attributes of all nodes we use
print("The attributes for all the nodes are: "
, G.nodes(data=True))

Listing 7. Viewing created Node-Attributes in Python

# To view the attributes of a specific edge


print("The attributes for the edge between
Alice and Bob are:", G.edges["Alice", "Bob
"])
Printing the Attributes of the Nodes
# To view the attributes of multiple specific
The attributes for the node Bob are: ’age’: 25, ’city’: edges
’Los Angeles’ print("The attributes for the edges between
Alice-Bob and Alice-Charlie are:",
{edge: G.edges[edge] for edge in [("
The attributes for the node Bob and Alice are: Alice", "Bob"), ("Alice", "Charlie")
’Bob’: ’age’: 25, ’city’: ’Los Angeles’, ’Alice’: ’age’: ]})
30, ’city’: ’New York’
# To view the attributes of all edges
The attributes for all the nodes are: [(’Alice’, print("The attributes for all the edges are:",
G.edges(data=True))
’age’: 30, ’city’: ’New York’), (’Bob’, ’age’: 25,
’city’: ’Los Angeles’), (’Charlie’, ), (’David’, ),
Listing 9. Viewing created Edge-Attributes in Python
(’Eve’, )]

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

Printing the Attributes of the Edges Loading and saving graph data are crucial operations in
Neo4j, particularly when working with large datasets. Neo4j
The attributes for the edge between Alice and Bob provides several methods for data import and export, including
are: ’relationship’: ’friends’, ’weight’: 4 CSV import for batch processing and APOC procedures for
more complex tasks. Neo4j supports importing CSV files,
The attributes for the edges between Alice-Bob which is especially useful for large datasets. Data can be
and Alice-Charlie are: (’Alice’, ’Bob’): ’relationship’: loaded using the LOAD CSV command in Cypher.
’friends’, ’weight’: 4, (’Alice’, ’Charlie’): ’weight’: 3

The attributes for all the edges are: [(’Alice’,


’Bob’, ’relationship’: ’friends’, ’weight’: 4), (’Alice’, LOAD CSV WITH HEADERS FROM ’file:///people.csv
’Charlie’, ’weight’: 3), (’Alice’, ’Faith’, ’relationship’: ’ AS row
CREATE (p:Person {name: row.name, age:
’friends’, ’weight’: 4), (’Bob’, ’David’, ), (’Charlie’,
toInteger(row.age), city: row.city})
’Eve’, )]
Listing 12. Loading Data from a CSV File

This command reads a CSV file (people.csv) and


The visualization of edges with their attributes is shown in creates a Person node for each row, with attributes assigned
Figure 4. from the CSV columns. Neo4j’s LOAD CSV command is
powerful, allowing large datasets to be processed efficiently
[105]. Neo4j allows data to be exported through the APOC
(Awesome Procedures on Cypher) library, which provides ex-
tensive functionality for managing data. Nodes, relationships,
and entire graphs can be exported to formats like CSV, JSON,
or even Cypher queries.

CALL apoc.export.csv.all(’exported_graph.csv’,
{useTypes: true, delimiter: ’;’})
Listing 13. Saving Data in Neo4j

This code exports the entire database to a CSV file. The


useTypes option ensures that node and relationship types
are preserved, while the
Fig. 4. Graph visualization of edges with attributes in NetworkX textttdelimiter specifies the separator to be used in the file.
This functionality is especially useful for creating backups or
In Neo4j, attributes (called properties in Neo4j) are transferring data to other systems [54], [16]
assigned using Cypher when creating nodes and relationships.
B. Using Neo4j in Python
CREATE (a:Person {name: ’Alice’, age: 30, city To work with Neo4j in Python, we use the Neo4j Python
: ’New York’}) Driver to connect to a Neo4j database and execute Cypher
queries. This allows Python-based applications to interact
Listing 10. Assigning Attributes to Nodes in Neo4j
directly with Neo4j. Before starting, ensure that the Neo4j
Python driver is installed.
pip install neo4j
MATCH (a:Person {name: ’Alice’}), (b:Person { Listing 14. Install Neo4j
name: ’Bob’})
CREATE (a)-[:FRIEND {since: 2015, closeness: Next, connect to a running Neo4j database instance. For
4}]->(b) local setups, the default URL is typically bolt://localhost:7687,
and you will need to provide the database credentials. To create
Listing 11. Assigning Attributes to Relationships in Neo4j
the Neo4j Database, you can do the following:
In this example, the FRIEND relationship between Alice • Go to the Neo4j Download Page and download Neo4j
and Bob has properties since and closeness, adding Desktop. This is a standalone app that provides an easy
contextual information to the connection. Assigning attributes way to install and manage local Neo4j databases. Follow
enhances the graph’s informational depth, allowing for more the installation instructions for your operating system.
meaningful queries and analyses, especially in applications Once installed, open Neo4j Desktop.
where metadata plays a crucial role [30]. • In Neo4j Desktop, create a new project to organize your
databases. You can name it anything you would like. For

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

example, “MyFirstGraphDatabase.” Note that the names Output message


should not have spaces, or it will show an error.
• Within the project, create a new database by clicking Person nodes in the database:
“Add Database” and choosing “Local DBMS.” Set a
name for the database (e.g., test_database) and a Node element id=’4:bd31ae9c-f686-4122-b53f-
password of choice. Note: The default username will be f8d6cb58adb9:0’ labels=frozenset(’Person’)
neo4j. Start the database by clicking the play button. properties=’name’: ’Alice’
• After starting the database, click on it to see Node element id=’4:bd31ae9c-f686-4122-b53f-
the connection details. You will see the bolt URL f8d6cb58adb9:1’ labels=frozenset(’Person’)
(bolt://localhost:7687), necessary for connecting to Neo4j properties=’name’: ’Bob’
from Python. If using Chrome, typing in localhost:7687
may also work. To get cleaner output, change the query as follows:
Now, connect to the Neo4j Database from Python. Use the def get_person_nodes(tx):
following template code, replacing ”your password” with the result = tx.run("MATCH (n:Person) RETURN n
password you set when creating the database. .name AS name")
for record in result:
uri = "bolt://localhost:7687" name = record["name"]
username = "neo4j" print(f"Name: {name}")
password = "your_password" # Replace with with driver.session() as session:
your password print("Person nodes in the database:")
driver = GraphDatabase.driver(uri, auth=( session.read_transaction(get_person_nodes)
username, password))
print("Connected to Neo4j!") Listing 18. Retrieving Cleaned Nodes Output

Listing 15. Connect to Neo4j Database from Python


Output message

Output message Person nodes in the database: Name: Alice Name: Bob

Connected to Neo4j!
Figure 5 shows the graph output in Neo4j. Next, we define
relationships as follows:
Now, let’s run Cypher queries to interact with the database.
def create_friend_relationship(tx):
def create_simple_nodes(tx): tx.run("""
tx.run("CREATE (a:Person {name: ’Alice’})" MATCH (a:Person {name: ’Alice’}), (b:
) Person {name: ’Bob’})
tx.run("CREATE (b:Person {name: ’Bob’})") CREATE (a)-[:FRIEND]->(b)
""")
# Run the transaction
with driver.session() as session: # Run the transaction
session.write_transaction( with driver.session() as session:
create_simple_nodes) session.write_transaction(
print("Created nodes Alice and Bob.") create_friend_relationship)
print("Created FRIEND relationship between
Listing 16. Creating Nodes using Neo4j from Python Alice and Bob.")

Listing 19. Creating Edges using Neo4j from Python


After this, confirm that the nodes have been created by
running the following command:
Output message
# Function to run the MATCH query and retrieve
all Person nodes
Created FRIEND relationship between Alice and Bob
def get_person_nodes(tx):
result = tx.run("MATCH (n:Person) RETURN n as seen in Figure 6
")
for record in result:
print(record["n"]) # Prints each
Person node C. Case Study: Simple Social Network Graph
# Run the code within a session
with driver.session() as session: 1) Objective: This case study demonstrates the creation
print("Person nodes in the database:")
of a simple social network graph, where nodes represent
session.read_transaction(get_person_nodes)
individuals and edges define friendships. Each person has
Listing 17. Retrieving the Nodes using Neo4j from Python attributes such as age and city, and each friendship has a
closeness level to indicate relationship strength.

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_color=’
lightblue’, node_size=3000, font_size=10)

# Display node attributes


node_labels = {node: f"{node}\nAge: {G.nodes[
node][’age’]}, City: {G.nodes[node][’city
’]}" for node in G.nodes}
nx.draw_networkx_labels(G, pos, labels=
node_labels, font_size=8)

# Display edge attributes


edge_labels = {(u, v): f"Closeness: {G.edges[u
, v][’closeness’]}" for u, v in G.edges}
nx.draw_networkx_edge_labels(G, pos,
edge_labels=edge_labels, font_size=8)

# Save and display the graph


plt.savefig("social_network_example.png",
format="png")
plt.show()

Listing 20. Creating a Simple Social Network Graph with NetworkX

4) Visualization: Figure 7 displays each individual as a


node with age and city information, while friendship closeness
is represented on edges.
This example demonstrates how to construct a social net-
Fig. 6. Relationships work graph and visualize it effectively with attributes on both
created in Neo4j
Fig. 5. Nodes created nodes and edges. Such graphs are useful in social network
in Neo4j analysis, where relationships and individual characteristics are
essential for understanding network structures.

2) Scenario: The social network consists of five individu-


als: Alice, Bob, Charlie, David, and Eve. Each person lives in a III. G RAPH P ROPERTIES AND BASIC O PERATIONS
different city and has varying ages. Friendships are represented
by edges, each with a unique closeness level. This network A. Definitions
structure helps analyze relationships within social groups and
identify key connections. 1) Degree: The degree of a node (or vertex) in a graph is
3) Graph Creation in Python with NetworkX: To create this defined as the number of connections or edges it has, which is
network, we use NetworkX, a Python library for graph-based a fundamental indicator of the node’s connectivity within the
data modeling. graph. This property is crucial for assessing various metrics,
such as centrality and influence, in social network analysis.
import networkx as nx Specifically, degree centrality is helpful in identifying influen-
import matplotlib.pyplot as plt tial nodes, as highlighted by [113]. Moreover, general degree
# Create a graph instance distance measures further capture aspects of connectivity and
G = nx.Graph() network efficiency, as discussed by [130]. Beyond social
networks, the degree metric holds significant value in network
# Add nodes with attributes security contexts. For instance, understanding the degree of
G.add_node("Alice", age=30, city="New York") vertices can help in evaluating network vulnerabilities [6] and
G.add_node("Bob", age=25, city="Los Angeles")
G.add_node("Charlie", age=35, city="Chicago") determining vertex importance in complex networked systems,
G.add_node("David", age=28, city="San as shown in studies on betweenness and other centrality
Francisco") measures by [45].
G.add_node("Eve", age=32, city="Boston") 2) Path Length: Path length refers to the shortest path or
# Add edges with a ’closeness’ attribute distance between two nodes, representing reachability within
G.add_edge("Alice", "Bob", closeness=5) the network. This metric is critical in routing and communi-
G.add_edge("Alice", "Charlie", closeness=4) cation applications. Distance measures based on path lengths,
G.add_edge("Bob", "David", closeness=3) such as shortest and detour paths, help quantify network
G.add_edge("Charlie", "Eve", closeness=4) efficiency [97]. Like diametral paths, path-based analyses have
G.add_edge("David", "Eve", closeness=2)
applications in understanding complex network topologies,
# Draw the graph with node and edge attributes where analyzing shortest paths improves routing and resource
plt.figure(figsize=(10, 8)) allocation [87].

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

Fig. 7. Visualization of the Simple Social Network Graph

3) Adjacency Matrix: An adjacency matrix represents the valuable insights into network structure and layout. Common
connections between nodes in a graph, where each matrix metrics include the shortest path distance, alongside special-
element indicates the presence or absence of an edge between ized measures like the Fréchet distance, which is used for
node pairs. This matrix format is essential for visualizing and embedded graphs [4]. Recent research has also introduced
calculating various graph properties, such as similarity and the reciprocal degree distance (RDD), a robust metric for
distances, as explored by [43]. The adjacency matrix serves evaluating vertex connectivity and network resilience [7].
as a foundational tool in defining the structural properties of
a graph and is particularly valuable in specialized fields like B. Pathfinding and Shortest Path Algorithms
protein network analysis, as noted by [3]. Its utility extends
1) Dijkstra’s Algorithm: Dijkstra’s algorithm calculates the
to computational applications, enabling efficient matrix-based
shortest path from a source node to all other nodes in a
algorithms for graph traversals and spectral analysis [33]. In
graph, with a time complexity of O((m + n) logn), where
structural analyses, adjacency matrices represent and manipu-
m is the number of edges and n is the number of nodes.
late graph data. For instance, they allow for the calculation of
This algorithm is effective for networks where edge weights
spectral radii, a measure that reflects the connectivity strength
are non-negative, as it iteratively selects the shortest known
within a network [132].
path until the destination node is reached or all nodes have
4) Distance Measures: Distance measures, such as shortest been visited. Comparisons with A* have shown that while
paths, play a crucial role in determining network connectiv- Dijkstra is generally more resource-intensive for complex
ity and resilience. These metrics help map communication environments, it remains preferable in grids without heuristic
patterns within networks and enhance our understanding of guidance [24].
network dynamics, as explored by [100]. Additionally, dis- 2) Implementation in NetworkX: NetworkX, a Python li-
tance measures are often used in conjunction with degree brary for graph manipulation, provides an efficient way to
properties to assess structural differentiation in specialized apply Dijkstra’s algorithm.
applications, such as brain functional networks [74]. Distance import networkx as nx
measures quantify the separation between nodes, offering

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

# Create a graph and add weighted edges • Robotic Navigation: In robotic fulfillment systems, Di-
G = nx.Graph() jkstra’s algorithm is foundational for path planning in
G.add_weighted_edges_from([(1, 2, 7), (1, 3, warehouses, optimizing routes within complex layouts to
9), (2, 4, 10), (3, 4, 2)])
improve operational efficiency [64].
# Compute shortest path using Dijkstra’s 5) Algorithmic Efficiency Studies: Studies highlight Dijk-
algorithm stra’s efficiency, especially in unweighted or lightly weighted
shortest_path = nx.dijkstra_path(G, source=1,
target=4) networks, though adaptations like parallel computing or com-
path_length = nx.dijkstra_path_length(G, bining with other algorithms (e.g., GNNs) can further enhance
source=1, target=4) performance. For instance, combining pathfinding with Graph
Neural Networks (GNNs) improves predictions and reduces
print("Shortest path:", shortest_path) transaction costs in finance [23].
print("Path length:", path_length)

Listing 21. Retrieving Edges using Neo4j from Python


C. Graph Density and Connectivity Analysis
Output message Graph density measures the number of edges in a graph
compared to the maximum possible number of edges. For an
Shortest path: [1, 3, 4] undirected graph G with n nodes and m edges, the density D
Path length: 11 can be calculated by:

3) Implementation in Neo4j: Neo4j, a graph database, pro- 2m


D=
vides the ‘gds.shortestPath’ function to implement Dijkstra’s n(n − 1)
algorithm efficiently in large datasets.
In directed graphs, the formula is adjusted based on directed
MATCH (start:Location {name: "A"}), (end: edges. High graph density indicates a well-connected graph,
Location {name: "B"}) which is often essential in applications requiring robust con-
CALL gds.shortestPath.dijkstra.stream({ nectivity, like transportation and social networks [63].
sourceNode: start,
targetNode: end, 1) Connected Components: Connected components repre-
relationshipWeightProperty: ’distance’ sent maximal subgraphs in which any two nodes are connected
}) by paths. Identifying these components is imperative for
YIELD nodeId, cost network segmentation and isolation tasks. For instance, in
RETURN gds.util.asNode(nodeId).name AS node,
cost dynamic graphs, connected components can reveal network
changes over time, making them suitable for tracking connec-
Listing 22. Identifying the shortest route using Dijkstra’s algorithm tivity in evolving networks [129].
2) Implementation in Python (NetworkX): NetworkX offers
Output message tools to identify connected components:
import networkx as nx
Shortest path from A to B: Node: A, Cumulative
Cost: 0 G = nx.Graph([(1, 2), (2, 3), (4, 5)])
Node: C, Cumulative Cost: 5 components = list(nx.connected_components(G))
Node: B, Cumulative Cost: 10 print("Connected Components:", components)

Listing 23. Identifying connected components in a graph

This query identifies the shortest route between two loca- Here, nodes are grouped into distinct connected compo-
tions based on a weighted property (e.g., distance), suitable for nents, which is useful for graph clustering and community
applications in logistics and supply chain management where detection.
cost minimization is important [64].
4) Use Cases and Applications: Output message
• Financial Networks: Dijkstra’s algorithm aids in portfolio
Connected Components: [1, 2, 3, 4, 5]
management by minimizing transaction costs in financial
asset graphs, where finding cost-efficient paths improves
rebalancing strategies in trading 3) Connectivity Measures: Connectivity measures quantify
[126]. a graph’s robustness to disconnections. Key metrics include
• Gaming and Simulation: Dijkstra’s algorithm is used in vertex connectivity (the minimum number of nodes that need
game development to calculate optimal paths in grid- removal to disconnect the graph) and edge connectivity (the
based environments, offering precise pathfinding that minimum number of edges that must be removed). These
supports AI decision-making for navigating virtual spaces measures assess network resilience, especially in critical in-
[72]. frastructure and fault-tolerant systems [55].

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

Fig. 8. Graph Showing Two Connected Components: {1, 2, 3} and {4, 5}

4) Advanced Connectivity Techniques: Recent research ex-


plores extended connectivity properties like algebraic connec-
tivity, calculated from the second-smallest eigenvalue of the
graph Laplacian matrix, providing insights into graph stability Fig. 9. Graph Visualization of the Shortest Path from A to B
and connectedness [136]. Incremental algorithms also dynam-
ically track connectivity changes in real time for adaptive This query helps identify alternative routes within networks,
systems, which is particularly valuable in network monitoring essential for recovery in disrupted networks, such as transport
[42]. or electric grids [91].
3) Node and Edge Deletion for Fault Tolerance: Assessing
D. Basic Graph Operations for Graph Analysis network resilience often involves simulating node or edge
1) Node and Edge Creation: In Neo4j, defining nodes and failures. By removing connections in Cypher, analysts can
edges enables graph construction and connectivity modeling, evaluate a network’s robustness:
foundational for resilience studies. For example, creating a MATCH (n1:Node {name: ’A’})-[r:CONNECTS]->(n2:
logistics network might look like: Node {name: ’B’})
CREATE (n1:Node {name: ’Warehouse’}) DELETE r;
CREATE (n2:Node {name: ’Distribution Center’})
Listing 25. Removing an edge between two nodes
CREATE (n1)-[:CONNECTS {distance: 50}]->(n2)
This operation tests how network structure changes affect
This structure aids in modeling and assessing critical links connectivity, providing insights into fault tolerance.
within networks, such as supply chains or urban infrastructure
[120]. Output message
2) Shortest Path for Resilience: Shortest path algorithms
in Cypher are key for understanding redundancy and finding Connection between Node A and Node B has been
alternate paths. In Neo4j, this can be done using: removed.
MATCH (start:Node {name: ’A’}), (end:Node {
name: ’B’}) This operation tests how network structure changes affect
MATCH path = shortestPath((start)-[*]-(end)) connectivity, providing insights into fault tolerance [139].
RETURN path; 4) Subgraph Analysis for Core Resilience: Neo4j allows
Listing 24. Finding the shortest path between two nodes using Cypher for subgraph extraction, useful for examining connected com-
ponents or community resilience. In scenarios where critical
This query helps identify alternative routes within networks, nodes are removed, this helps identify isolated subgraphs:
essential for recovery in disrupted networks, such as transport
or electric grids. CALL gds.wcc.stream({
nodeProjection: ’Node’,
relationshipProjection: ’CONNECTS’
Output message
})
YIELD componentId, nodeId
Shortest path from A to B: Node: A RETURN componentId, gds.util.asNode(nodeId).
Node: C name AS nodeName
Node: B
This analysis identifies weak points within a network, high-
lighting nodes that increase resilience when protected [73].

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

IV. M ANIPULATING G RAPHS IN N ETWORK X AND N EO 4 J


A. Adding and Removing Nodes and Edges
1) Adding Nodes and Edges: Adding nodes and edges is
essential for growing or updating networks, which is common
in dynamic environments such as social networks, infrastruc-
ture models, and IoT systems.
a) In NetworkX:: NetworkX provides straightforward
methods for adding single or multiple nodes and edges:
import networkx as nx
G = nx.Graph()

# Adding a single node


G.add_node("A")

# Adding multiple nodes


G.add_nodes_from(["B", "C", "D"])

# Adding a single edge


G.add_edge("A", "B")

# Adding multiple edges


Fig. 10. Graph Visualization Before Edge Deletion G.add_edges_from([("A", "C"), ("B", "D")])

This code enables quick network expansion. NetworkX also


supports adding attributes to nodes and edges, which is useful
for enriching nodes with properties like ”weight” or ”type” for
specific applications.
b) In Neo4j:: In Neo4j, nodes and edges (referred to as
”relationships”) can be added using Cypher commands:
// Adding a node with properties
CREATE (a:Person {name: ’Alice’, age: 30})
// Adding an edge (relationship) between nodes
MATCH (a:Person {name: ’Alice’}), (b:Person {
name: ’Bob’})
CREATE (a)-[:KNOWS]->(b)

Adding nodes and edges with properties allows for flexible


data modeling in Neo4j, where entities are enriched with con-
textual data, enabling deeper insights in subsequent analysis.
Studies on network expansion techniques, such as plug-and-
play approaches, indicate that adding nodes should prioritize
maintaining connectivity to avoid fragmentation, which is
critical for network stability [117].
2) Removing Nodes and Edges: Removing elements from
a network is often required for scenarios such as simulating
node failures, pruning unnecessary connections, or analyzing
Fig. 11. Graph Visualization After Edge Deletion (Node A to Node B
removed) the effects of disruptions.
a) In NetworkX:: Removing nodes and edges in Net-
workX is efficient, supporting single and batch removals:
# Removing a single node
G.remove_node("A")
5) Case Study: Network Resilience in a Transportation
System: A practical application of these operations can be # Removing multiple nodes
seen in resilience analysis of transportation networks, where G.remove_nodes_from(["B", "C"])
resilience measures are vital for post-disaster recovery. Studies
# Removing a single edge
on New York City’s transportation resilience during snow- G.remove_edge("A", "B")
storms utilized graph metrics to estimate operational continuity
with reduced data, which can be replicated in Neo4j for real- # Removing multiple edges
time disruption analysis [91]. G.remove_edges_from([("A", "C"), ("B", "D")])

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

NetworkX’s API ensures that any attempt to remove non- well-suited for handling high volumes of relationship data,
existent nodes or edges raises an error, helping maintain data while NetworkX’s graph manipulation methods are effective
integrity. for network simulations and analyses requiring flexibility and
b) In Neo4j:: Cypher supports selective deletion, allow- fast prototyping.
ing precise control over node and relationship removal. When • Neo4j: For applications requiring real-time analysis,
removing a node, the associated relationships must also be Neo4j’s Cypher queries enable fast lookups and modi-
removed: fications, making it ideal for continuously evolving net-
// Remove a node and all its relationships works where each update can impact overall resilience or
MATCH (n:Person {name: ’Alice’}) connectivity.
DETACH DELETE n • NetworkX: NetworkX’s Python-based functions provide
the flexibility to experiment with graph changes, simu-
This command removes both the node and any connect-
late network effects, and validate different manipulation
ing relationships, ensuring no orphaned relationships remain,
strategies.
which is important for data integrity. In large-scale networks,
especially IoT and social networks, strategies for intelligent
edge and node removal optimize robustness against failures. B. Working with Subgraphs
The Intelligent Rewiring (INTR) mechanism, for example, 1) Extracting Subgraphs: Extracting subgraphs is a com-
enhances resilience by redistributing connectivity in scale-free mon operation in graph analysis, allowing for focused explo-
networks [1]. ration of specific regions or patterns within a large graph.
3) Modifying Nodes and Edges: Modifying the properties • In NetworkX: NetworkX provides tools for creating
of nodes and edges is essential for dynamically updating subgraphs based on node or edge criteria. For instance, to
network attributes, enhancing edge attributes for pathfinding, extract a subgraph that includes only nodes connected to
reliability, or transport optimization. a particular central node, we can use the following code:
a) In NetworkX:: NetworkX supports adding or updating
import networkx as nx
attributes for nodes and edges, which is useful for applications G = nx.Graph()
like traffic modeling or infrastructure analysis: G.add_edges_from([(1, 2), (2, 3), (3, 4),
# Adding or modifying a node attribute (4, 5), (1, 5)])
G.nodes["A"]["type"] = "hub"
# Extracting a subgraph containing nodes
# Adding or modifying an edge attribute adjacent to node 1
G["A"]["B"]["weight"] = 4 nodes = list(G.neighbors(1)) + [1] #
Nodes adjacent to 1 plus node 1 itself
subG = G.subgraph(nodes)
These modifications allow the network to evolve, adjusting
characteristics such as edge weights based on real-time data # Display nodes and edges in the subgraph
or node roles based on updated contexts. print("Nodes in subgraph:", subG.nodes())
b) In Neo4j:: Cypher’s SET command enables efficient print("Edges in subgraph:", subG.edges())
updates to node and edge properties, supporting incremental Listing 26. Extracting a subgraph of nodes adjacent to node 1 in NetworkX
changes in large datasets:
// Modify a node property This creates a subgraph containing all nodes adjacent to
MATCH (a:Person {name: ’Alice’})
node 1, providing a smaller, focused view of the network.
SET a.age = 31

// Modify an edge property Expected Output


MATCH (a)-[r:KNOWS]->(b)
SET r.since = 2021 Nodes in subgraph: [1, 2, 5]
Edges in subgraph: [(1, 2), (1, 5)]
This flexibility in Cypher supports operations like updating
relationships to model evolving social connections or transac- • In Neo4j: In Neo4j, subgraph extraction often involves
tional changes in networks. Cypher queries that filter nodes and relationships based
In performance-focused applications, modifying edge on specified criteria. For example, to extract a subgraph
weights or properties to optimize transport efficiency or con- that includes only nodes with specific properties, use:
nectivity in real-time is crucial. Studies on rewiring strategies
suggest that targeted edge modifications based on centrality MATCH (n:Person)-[r:KNOWS]-(m:Person)
WHERE n.age > 30
measures (e.g., betweenness) can substantially enhance net- RETURN n, r, m
work capacity and maintain structural stability in scale-free
Listing 27. Extracting a subgraph of people over 30 and their relationships
networks [70]. in Neo4j
4) Practical Applications and Efficiency: Efficient manip-
ulation of nodes and edges is essential for large networks, This query extracts a subgraph of people over the age of
such as in IoT, infrastructure modeling, and social network 30 and their relationships, isolating a subset of the graph
analysis. Neo4j’s indexing and query optimization make it based on property filters.

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

Expected Output Expected Output

Nodes with age > 30 and their relationships are Nodes in filtered subgraph: [2, 3, 4]
returned as part of the subgraph: Edges in filtered subgraph: [(2, 3), (3, 4)]

Node 1 Relationship Node 2


• Sampling-based Filtering: Techniques like random node
Example: Alice KNOWS Bob
sampling or feature-driven sampling reduce the graph size
Charlie KNOWS Dana
while preserving important features. These techniques are
efficient for creating representative subgraphs from large
2) Filtering Techniques for Subgraph Extraction: Filtering graphs [137].
techniques are important for managing the scope and size of In NetworkX (Example of Random Sampling):
subgraphs, especially in large networks where analyzing the import random
entire graph is computationally prohibitive. sampled_nodes = random.sample(G.nodes(),
3)
• Vertex-based Filtering: This technique filters nodes subG = G.subgraph(sampled_nodes)
based on attributes or connectivity. In a social network,
# Display nodes and edges in the sampled
for example, filtering by nodes with high centrality scores subgraph
can provide a subgraph of the most influential nodes. print("Nodes in sampled subgraph:", subG.
In NetworkX: nodes())
print("Edges in sampled subgraph:", subG.
# Subgraph with nodes of degree 2 or more edges())
high_degree_nodes = [n for n, d in G.
degree() if d >= 2] Listing 30. Random sampling of nodes to create a representative subgraph
subG = G.subgraph(high_degree_nodes)
Expected Output
# Display nodes and edges in the filtered
subgraph Nodes in sampled subgraph: [1, 3, 5]
print("Nodes in filtered subgraph:", subG.
Edges in sampled subgraph: [(1, 3)]
nodes())
print("Edges in filtered subgraph:", subG.
edges()) 3) Advanced Subgraph Analysis Techniques: Advanced
methods, such as those used in subgraph matching and count-
Listing 28. Extracting a subgraph of nodes with degree 2 or higher
ing, help optimize the extraction of subgraph patterns that
match specific structural criteria.
• Subgraph Matching: Techniques like FaSTest reduce the
Expected Output
sample space by filtering non-matching nodes or edges
Nodes in filtered subgraph: [1, 2, 3] before analyzing potential subgraph matches, thereby
Edges in filtered subgraph: [(1, 2), (2, 3)] improving efficiency [115]. These techniques are partic-
ularly valuable for tasks like network motif detection or
pattern recognition in biological networks.
• Edge-based Filtering: This technique filters edges based • Multi-View Filtering: Multi-view filters, as in the
on weights or relationship types. In transport networks, GMADL architecture, analyze subgraphs through mul-
filtering by high-weight (traffic-heavy) edges can reveal tiple criteria or perspectives, enhancing the precision
critical paths or bottlenecks. of subgraph extraction and aiding in classification tasks
In NetworkX: [140].
Efficient subgraph extraction is vital for analyzing large
# Subgraph with edges having a weight graphs without excessive computational costs. Techniques
above a certain threshold
heavy_edges = [(u, v) for u, v, d in G. like indexing, as highlighted by [119], enable fast re-
edges(data=True) if d.get("weight", 0) trieval of subgraphs by pre-indexing common structures,
> 5] reducing the need for exhaustive searches.
subG = G.edge_subgraph(heavy_edges) In large-scale graph analysis, bounding the size or com-
plexity of subgraphs through filtering also optimizes
# Display nodes and edges in the filtered
subgraph performance. For instance, algorithms designed for spe-
print("Nodes in filtered subgraph:", subG. cific girth constraints (the length of the shortest cycle)
nodes()) can quickly enumerate small subgraphs, improving both
print("Edges in filtered subgraph:", subG. speed and accuracy in sparse graph structures [71].
edges())
C. Merging and Comparing Graphs
Listing 29. Extracting a subgraph with edges above a certain weight threshold
1) Union of Graphs: The union of two graphs combines
all nodes and edges from both graphs. In practical applica-

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

tions, union operations are useful for consolidating data from of graphs is essential in analyzing sub-networks or common
different graph sources, such as merging social networks or structural elements in composite networks. For example,
integrating infrastructure networks. [13] demonstrated how graph intersections aid in anti-
• In NetworkX: NetworkX provides the nx.compose() unification of term-graphs, focusing on bisimilar structures and
function for union operations. common attributes.
3) Difference of Graphs: The difference operation between
import networkx as nx two graphs removes nodes and edges in one graph from
G1 = nx.Graph()
another, often used to isolate unique or exclusive features.
G1.add_edges_from([(1, 2), (2, 3)])
• In NetworkX: NetworkX provides
G2 = nx.Graph() nx.difference() to compute the difference:
G2.add_edges_from([(3, 4), (4, 5)])
import networkx as nx
# Union of G1 and G2 G1 = nx.Graph([(1, 2), (2, 3), (3, 4)])
G_union = nx.compose(G1, G2) G2 = nx.Graph([(2, 3)])

This operation retains all unique nodes and edges from both # Difference of G1 and G2
graphs. For cases where graphs may have overlapping nodes G_difference = nx.difference(G1, G2)
with conflicting attributes, nx.composea ll() can handle mul- # Display nodes and edges in the difference
tiple graphs and preserve their structure without overwriting graph
attributes. print("Nodes in difference graph:",
G_difference.nodes())
• In Neo4j: In Neo4j’s Cypher, union operations can be
print("Edges in difference graph:",
achieved by merging nodes and relationships: G_difference.edges())
MATCH (a:Person)-[:KNOWS]-(b:Person) Listing 31. Computing the difference of two graphs in NetworkX
WITH collect(DISTINCT a) + collect(DISTINCT b)
AS nodes
UNWIND nodes AS n Expected Output
RETURN n
Nodes in difference graph: [1, 3, 4]
This example aggregates all unique nodes across different Edges in difference graph: [(1, 2), (3, 4)]
subgraphs, which can be useful for consolidating social net-
works or community structures. Graph unions are important This operation identifies unique edges in one graph that are
in network science for tasks that require amalgamating graph absent in the other, such as unique connections in one part of
structures to study composite properties or centralities. For a network or exclusive features in comparative studies.
instance, [124] explored betweenness centrality in composite
• In Neo4j: In Neo4j, a difference can be implemented
graphs created by union operations, highlighting the signifi-
by filtering nodes and relationships absent in the second
cance of graph unions in network dynamics.
graph:
2) Intersection of Graphs: The intersection of two graphs
includes only the nodes and edges common to both graphs. MATCH (a)-[r:KNOWS]-(b)
This operation is valuable for identifying shared connections, WHERE NOT (a)-[:COLLEAGUES]-(b)
such as mutual friends in social networks or common pathways RETURN a, r, b
in biological networks.
This query extracts relationships labeled “KNOWS” that are
• In NetworkX: NetworkX supports intersection using
not also labeled “COLLEAGUES,” isolating unique interac-
nx.intersection(): tions in the “KNOWS” network. Graph differences help isolate
# Intersection of G1 and G2 exclusive relationships or features in comparative network
G_intersection = nx.intersection(G1, G2) studies, such as identifying unique interaction patterns in
different community groups [5].
This function returns a new graph containing only the nodes 4) Comparing Graph Properties: Comparing graphs in-
and edges present in both G1 and G2, useful for focusing on volves analyzing their structural and spectral properties, often
core relationships or common elements between graphs. focusing on metrics like centrality, spectral properties, and
• In Neo4j: Cypher queries can filter overlapping nodes graph distance measures.
and edges: • Spectral Analysis: Spectral analysis involves studying
the eigenvalues of graph adjacency or Laplacian matrices.
MATCH (a)-[r:FRIENDS]-(b)
WHERE (a)-[:COLLEAGUES]-(b) For example, the join operation in graphs, which affects
RETURN a, b, r eigenvalues and distance spectra, plays a role in compar-
ing network robustness and connectivity properties [5].
This command extracts relationships that exist under both • Gromov-Wasserstein Distance: The partial Gromov-
“FRIENDS” and “COLLEAGUES,” effectively creating an Wasserstein distance, which maps nodes across partially
intersection based on relationship criteria. The intersection overlapping graphs, facilitates complex comparison tasks

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

where exact alignment is not feasible [79]. This distance B. Spectral Layout
is effective in matching nodes with different structural Spectral layouts use eigenvectors of the graph’s Laplacian
roles, providing a distinct comparison of graph similarity. matrix to place nodes in a way that minimizes certain energy
metrics. This layout is effective for organizing nodes in clus-
V. A DVANCED G RAPH V ISUALIZATIONS tered structures, making it suitable for community detection
A. Circular Layout or identifying dense areas within sparse graphs.
• In NetworkX: NetworkX’s
The circular layout arranges nodes in a circular form,
which is particularly effective for visualizing hierarchical data nx.spectral_layout() function uses eigenvalues
structures or networks where symmetry and clear adjacency to determine node positions:
relationships are crucial. Circular layouts make it easy to see pos = nx.spectral_layout(G)
clusters and cyclic dependencies in a graph, which can be nx.draw(G, pos, with_labels=True)
beneficial in social network analysis or biological pathway plt.show()
visualization. Listing 33. Arranging nodes in a spectral layout using NetworkX
1) Circular Layout of Nodes: NetworkX provides the
nx.circular_layout() function to arrange nodes in a
circle. This layout is useful for visualizing cyclic structures or
ensuring equal spacing among nodes.
• In NetworkX: To create a circular layout, we can use
the following code:
import networkx as nx
import matplotlib.pyplot as plt

G = nx.cycle_graph(10) # Create a cycle graph


with 10 nodes Fig. 13. Spectral Layout of a Sample Graph
pos = nx.circular_layout(G) # Arrange nodes
in a circle Spectral layouts can reveal underlying clusters or groupings
nx.draw(G, pos, with_labels=True) # Draw the
within complex networks, making them valuable for analyzing
graph
plt.show() # Display the circular layout community structures in social networks or functional modules
in biological networks. Spectral properties effectively maintain
Listing 32. Arranging nodes in a circular layout using NetworkX node distances that reflect the graph’s overall structure, sup-
porting network clustering and optimization.

C. Spring Layout (Force-Directed Layout)


The spring layout, often implemented using the
Fruchterman-Reingold algorithm, models the graph as a
physical system where nodes repel each other while edges act
like springs, pulling connected nodes together. This layout is
commonly used for undirected graphs, balancing readability
and symmetry.
• In NetworkX: The spring layout in NetworkX is imple-
mented via nx.spring_layout():
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True)
plt.show()

Listing 34. Arranging nodes in a spring layout using NetworkX

This algorithm repetitively adjusts node positions based on


Fig. 12. Circular Layout of a 10-Node Cycle Graph repulsive and attractive forces, reaching an equilibrium where
edge lengths are minimized and node overlap is reduced.
This layout is useful for visualizing cycles or connected The Fruchterman-Reingold algorithm, discussed by [106], is
components, as each node is equally spaced, allowing clear widely applied in mathematics, computer science, and network
visibility of connections and node degrees. In a study by [62], visualization tasks for visually balanced graphs.
a circular layout was used in the Visual Graph system to Each layout has unique strengths suited to different appli-
visualize hierarchical graphs with attributed nodes, proving cations:
beneficial in fields like engineering and data science, where • Circular Layouts highlight cyclic structures and symme-
hierarchical relationships need representation. try, useful in hierarchical or modular systems.

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

should be legible, concise, and strategically positioned to avoid


clutter.

E. Labeling Nodes and Edges in NetworkX


NetworkX supports adding labels to both nodes and edges,
as shown below:
• Labeling Nodes and Edges in NetworkX: NetworkX
supports adding labels to both nodes and edges, as shown
below:
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True)
labels = {edge: f"{edge[0]}-{edge[1]}" for
edge in G.edges()}
nx.draw_networkx_edge_labels(G, pos,
edge_labels=labels)
plt.show()

Fig. 14. Spring Layout of a Sample Graph Listing 35. Labeling nodes and edges in a NetworkX graph

• Spectral Layouts reveal clusters and assist in network


segmentation, valuable for community or cluster visual-
ization.
• Spring Layouts provide a balance between readability
and spatial organization, ideal for undirected, unstruc-
tured graphs.

D. Enhancing Graph Aesthetics


1) Using Colors to Enhance Readability: Color is a fun-
damental tool in graph design, impacting both aesthetics and
readability. Colors can highlight key areas, distinguish groups,
and emphasize patterns without overwhelming the viewer.
a) Color Selection:: Choosing a cohesive color palette
with appropriate contrast is essential for readability. For in-
stance, contrasting colors (like dark blue on light yellow) im-
prove visibility, while softer palettes work well for background
elements to avoid distraction. Fig. 15. Graph with Labeled Nodes and Edges
b) Applications in NetworkX:: NetworkX allows
for color customization using the node_color and Here, node labels are automatically placed based on node
edge_color parameters: positions, and custom edge labels are defined for clarity.
Studies on text aesthetics highlight that font choices and label
import networkx as nx
import matplotlib.pyplot as plt placements significantly impact readability. A study by [85]
demonstrated that font characteristics influence comprehen-
G = nx.cycle_graph(10) sion, which applies to labels in graphical interfaces as well.
colors = ["skyblue" if i % 2 == 0 else "salmon Using readable fonts, appropriate sizes, and avoiding overlap
" for i in range(10)] with nodes or edges enhances the viewer’s experience.
nx.draw(G, node_color=colors, with_labels=True
) 1) Adding Legends for Context: Legends are crucial in
plt.show() multi-colored or multi-symbol graphs, providing a reference
for interpreting different colors, shapes, or line types. They
This example alternates colors for nodes, which can help allow viewers to quickly identify categories, improving the
differentiate categories or clusters within a graph. According overall usability of the visualization.
to [19], color choices impact not only aesthetic appeal but also a) Adding Legends in NetworkX:: While NetworkX
viewer comprehension. Colors should be chosen thoughtfully does not natively support legends, they can be added using
to support the message and enhance user engagement without matplotlib for clarity in complex graphs.
causing visual fatigue.
from matplotlib.lines import Line2D
2) Adding Labels for Clarity: Labels provide essential
context for nodes and edges, making it easier for viewers # Define custom legends
to understand the information conveyed in the graph. Labels legend_elements = [

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

Line2D([0], [0], marker=’o’, color=’w’, a) Hover Effects: Hover effects in Neo4j Bloom dis-
label=’Category 1’, markerfacecolor=’ play details about nodes and relationships when the cursor
skyblue’, markersize=10), hovers over them. This feature is particularly useful for data
Line2D([0], [0], marker=’o’, color=’w’,
label=’Category 2’, markerfacecolor=’ exploration, as users can quickly view metadata or attributes
salmon’, markersize=10) without needing to click on each item. Simply move the cursor
] over any element in the Bloom visualization, and a popup
plt.legend(handles=legend_elements, loc="upper will display relevant data, such as node type, properties, and
right") relationship details.
plt.show()
b) Zoom and Pan: Neo4j Bloom includes zoom and pan
Listing 36. Adding custom legends to a NetworkX graph features, enabling users to navigate large graphs effortlessly.
With the mouse wheel or pinch gestures, users can zoom in
on specific areas to inspect dense clusters or zoom out to get
an overview of the entire network. Pan functionality allows
users to click and drag across the graph, focusing on different
sections without losing context.
c) Cypher Commands for Bloom Visualizations: Neo4j
Bloom allows users to specify queries in Cypher to control
the visualization, focusing on specific subsets of the graph:
MATCH (p:Person)-[:FRIENDS_WITH]->(f:Person)
RETURN p, f

This query fetches only the nodes and relationships relevant to


a specific pattern, streamlining the visualization and making
it more manageable for exploration.
d) Benefits: Interactive features in Neo4j Bloom improve
user engagement by making it easier to focus on specific graph
segments. Studies highlight the effectiveness of interactive
visualizations in fostering data exploration, particularly in
network and relationship-rich datasets, where interactive tools
Fig. 16. Graph with Custom Legend
allow deeper insights into hidden connections [82].
2) Interactive Visualization in Plotly: Plotly is a Python-
based visualization library that supports interactive features,
This custom legend explains node colors, guiding viewers
including hover effects, zoom, and pan, making it ideal for
in interpreting categories or clusters. Proper legend placement
exploratory data analysis of network graphs.
and clarity are essential for readability. [82] emphasizes that a
a) Creating a Graph with Plotly: Using Plotly’s scatter
well-positioned, concise legend helps users interpret complex
or NetworkX integration, you can create a network graph and
data representations without confusion. Legends should avoid
enable interactive capabilities:
excessive detail and be strategically placed, typically outside
the main graph area, to prevent obstruction of key visuals. import plotly.graph_objects as go
import networkx as nx
2) Combining Aesthetics for Optimal Graph Design: For
a graph to be effective, colors, labels, and legends must # Create a sample NetworkX graph
work together seamlessly. This enhances understanding by G = nx.cycle_graph(10)
highlighting key areas, structuring information, and making pos = nx.spring_layout(G)
data relationships intuitive, ultimately reducing cognitive load.
# Extract positions and create scatter plot
[77] discusses the importance of minimizing visual clutter by traces
avoiding excessive edge crossings and ensuring that design edge_x = []
choices, such as color schemes and grid patterns, align with edge_y = []
user-friendly aesthetics for structured and accessible visual for edge in G.edges():
layouts. x0, y0 = pos[edge[0]]
x1, y1 = pos[edge[1]]
edge_x.extend([x0, x1, None])
edge_y.extend([y0, y1, None])
F. Interactive Graph Visualizations with Neo4j Bloom and edge_trace = go.Scatter(x=edge_x, y=edge_y,
Plotly line=dict(width=0.5, color="#888"),
hoverinfo="none", mode
1) Interactive Visualization in Neo4j Bloom: Neo4j Bloom ="lines")
is a visualization tool specifically designed for exploring data
node_x = [pos[node][0] for node in G.nodes()]
stored in Neo4j databases. It provides an interactive environ- node_y = [pos[node][1] for node in G.nodes()]
ment where users can investigate nodes and relationships in a node_trace = go.Scatter(x=node_x, y=node_y,
visually rich manner. mode="markers+text",

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

hoverinfo="text", partition = community_louvain.best_partition(G


marker=dict(size )
=10, color=" # Color nodes based on community
skyblue")) pos = nx.spring_layout(G)
cmap = plt.cm.get_cmap("viridis", max(
fig = go.Figure(data=[edge_trace, node_trace]) partition.values()) + 1)
fig.update_layout(showlegend=False) nx.draw_networkx_nodes(G, pos, partition.keys
fig.show() (), node_size=40,
cmap=cmap, node_color=
Listing 37. Creating an interactive network graph with Plotly list(partition.
values()))
This code example builds an interactive network plot, where nx.draw_networkx_edges(G, pos, alpha=0.5)
nodes and edges are visually represented, and hover informa- plt.show()
tion can be customized to show additional metadata about each
node. Listing 38. Louvain community detection in a NetworkX graph
b) Hover Effects: Hover effects in Plotly allow users to
quickly access information about nodes, such as their labels
or connected nodes. Custom hover text can be added to each
node in node_trace using the text parameter, which can
include node attributes or other relevant details:
node_trace.text = [f"Node {node}" for node in
G.nodes()]

c) Zoom and Pan: Plotly’s zoom and pan features are


automatically enabled in interactive graphs. Users can zoom
by scrolling or pinching and pan by clicking and dragging.
This functionality is valuable for detailed analysis, allowing
users to seamlessly navigate large networks.
3) Benefits of Interactive Visualization in Graph Analysis:
Interactive visualization enhances data analysis by allowing
users to engage dynamically with the data, fostering insights
that static visualizations may obscure. Interactive features
like hover effects, zoom, and pan are essential for exploring
complex networks, as they reveal details only when necessary,
preserving visual clarity. Fig. 17. Louvain Community Detection in a NetworkX Graph
Studies confirm that interactive visualizations significantly
improve the user experience, making it easier to detect patterns This example uses the Louvain method to partition a random
and gain insights. [77] found that interactivity, such as zoom graph, coloring nodes according to detected communities. The
and hover details, enhances the clarity of network visualiza- Louvain algorithm has broad applications in social network
tions by enabling users to focus on relevant sections without analysis, where identifying community structures helps under-
information overload. stand user groupings or detect hidden relationships. Recent
work, such as by [51], applies community detection in social
VI. A DVANCED G RAPH A LGORITHMS media to enhance recommendation systems by grouping users
with similar behavior patterns.
A. Community Detection and Clustering
2) Other Community Detection Algorithms: Several other
1) Louvain Algorithm: The Louvain algorithm is a popular community detection algorithms complement the Louvain
method for community detection in large networks due to its method, each suited to specific types of networks or goals:
scalability and effectiveness in maximizing modularity. This
algorithm operates in two phases: modularity optimization, • Degenerate Agglomerative Hierarchical Clustering
where nodes are grouped into communities, and community Algorithm (DAHCA): DAHCA clusters nodes based on
aggregation, where each community is treated as a single node. vertex similarity, often outperforming traditional methods
These phases repeat iteratively until the modularity reaches its in networks with low intra-community connectivity, par-
peak. ticularly in biological and social networks [88].
a) Implementation in Python with NetworkX: NetworkX • Hierarchical Clustering: Effective for analyzing com-
offers an implementation of the Louvain algorithm through plex, multi-level community structures, particularly in
external libraries like community: large datasets where computational complexity is a con-
sideration [121].
import networkx as nx • Overlapping Community Detection: Allows nodes to
import community as community_louvain
import matplotlib.pyplot as plt belong to multiple communities, reflecting real-world
G = nx.erdos_renyi_graph(100, 0.05) networks where entities often belong to different social
# Compute the best partition using Louvain or functional groups [47].

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

3) Applications of Community Detection in Real-World Net- Expected Output


works: Community detection algorithms are widely applied
across various fields: 1: 0.67,
2: 0.80,
• Social Networks: Identifies influential groups, under-
3: 1.00,
stands social dynamics, and improves targeted recommen-
4: 0.80,
dations in social media platforms [29].
5: 0.67
• Biological Networks: In genomics, detecting clusters
of interacting genes or proteins can reveal insights into
biological functions and disease mechanisms. 3) Betweenness Centrality: Betweenness centrality quan-
• Collaborative Filtering: Enhances recommendation sys-
tifies the extent to which a node lies on the shortest paths
tems by identifying clusters of users with similar prefer- between other nodes.
ences. a) Formula::
X σst (v)
CB (v) =
B. Centrality Measures in Network Analysis σst
s̸=v̸=t

1) Degree Centrality: Degree centrality is defined as the where σst is the total number of shortest paths from node s to
number of direct connections a node has, reflecting a node’s t, and σst (v) is the number of those paths that pass through
immediate influence within the network. v.
a) Formula:: b) Implementation in NetworkX:
CD (v) = deg(v)
betweenness_centrality = nx.
b) Implementation in NetworkX: betweenness_centrality(G)
print(betweenness_centrality)
c) Calculating Degree Centrality in NetworkX: Degree
centrality measures the fraction of nodes a particular node 4) Eigenvector Centrality: Eigenvector centrality measures
is connected to, highlighting its local importance within the a node’s influence based on its connections, considering the
network. importance of its neighbors.
a) Formula::
import networkx as nx
G = nx.Graph() 1 X
G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, CE (v) = CE (u)
λ
5), (1, 5)]) u∈N (v)
degree_centrality = nx.degree_centrality(G)
print(degree_centrality) where N (v) are the neighbors of v and λ is the largest
eigenvalue of the adjacency matrix.
Listing 39. Calculating degree centrality of nodes in a NetworkX graph
b) Implementation in NetworkX:

Expected Output
eigenvector_centrality = nx.
eigenvector_centrality(G)
1: 0.5, print(eigenvector_centrality)
2: 0.5,
3: 0.5,
4: 0.5,
5: 0.5 C. PageRank and Similar Ranking Algorithms
1) PageRank Algorithm: The PageRank algorithm assigns
2) Closeness Centrality: Closeness centrality measures a ranking to each node based on the number and quality of
how close a node is to all other nodes in the network. links to it.
a) Formula:: a) Mathematical Formula::
1 1−d P R(u)
CC (v) = P
X
P R(v) = +d
u̸=v d(u, v) N L(u)
u∈M (v)
where d(u, v) is the shortest path distance between u and v.
where d is the damping factor, N is the total number of nodes,
b) Implementation in NetworkX:
M (v) is the set of nodes that link to v, and L(u) is the number
of outbound links from u.
closeness_centrality = nx.closeness_centrality b) Implementation in NetworkX:
(G) The PageRank algorithm evaluates the importance of each
print(closeness_centrality)
node based on the structure of inbound links, commonly used
Listing 40. Calculating closeness centrality in a NetworkX graph in search engines to rank web pages.

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

1) Basics of Social Network Analysis: SNA examines the


import networkx as nx
G = nx.DiGraph()
nodes (individuals or entities) and edges (relationships) within
G.add_edges_from([(1, 2), (2, 3), (3, 1), (3, a network to identify patterns in connectivity and influence.
4), (4, 2)]) Core metrics used in SNA include:
pagerank_scores = nx.pagerank(G, alpha=0.85) • Degree Centrality: Measures the number of direct con-
print("PageRank Scores:", pagerank_scores)
nections a node has, helping identify well-connected
Listing 41. Calculating PageRank in a NetworkX graph individuals within a network.
• Betweenness Centrality: Quantifies how often a node
acts as a bridge along the shortest path between other
Expected Output
nodes, marking those who facilitate communication be-
PageRank Scores: tween clusters.
1: 0.29, • PageRank: An algorithm developed by Google to rank

2: 0.34, web pages, adapted in SNA to evaluate influence based


3: 0.26, on the importance of a node’s connections [39].
4: 0.11 2) Identifying Influencers with Neo4j: In Neo4j, influencers
within a network can be identified by calculating centrality
2) Weighted PageRank: metrics.
The Weighted PageRank (WPR) algorithm incorporates the a) Cypher Query for Degree Centrality:
importance of inbound and outbound links, adjusting scores
based on the weight of each edge. MATCH (p:Person)-[:FRIENDS_WITH]-(f)
a) Implementation in NetworkX: RETURN p.name AS Name, COUNT(f) AS
DegreeCentrality
ORDER BY DegreeCentrality DESC
G.add_weighted_edges_from([(1, 2, 0.8),
(2, 3, 0.5), (3, 1, 0.6)]) This query lists individuals with the most connections,
weighted_pagerank_scores = nx.pagerank(G,
highlighting potential influencers based on their direct links
alpha=0.85, weight=’weight’)
print("Weighted PageRank Scores:", [92].
weighted_pagerank_scores)
3) PageRank in Neo4j: This PageRank query calculates
Listing 42. Calculating Weighted PageRank in a NetworkX graph
the influence of nodes within a graph, ranking users based on
the importance of their connections.
Expected Output
CALL gds.pageRank.stream(’myGraph’)
Weighted PageRank Scores:
YIELD nodeId, score
1: 0.30, RETURN gds.util.asNode(nodeId).name AS Name,
2: 0.33, score
3: 0.25 ORDER BY score DESC

3) Topic-Sensitive PageRank: 4) Community Detection for Clustering: Community detec-


Topic-Sensitive PageRank (TSPR) ranks nodes based on rel- tion algorithms in Neo4j can identify clusters or groups within
evance to specific topics, making it suitable for personalized a network, often revealing hidden structures.
content delivery. a) Louvain Algorithm for Community Detection: Neo4j’s
4) Hybrid and Advanced Ranking Methods: implementation of the Louvain algorithm helps in detecting
Recent advancements integrate structural and user behavior clusters by optimizing modularity. This approach is effective
metrics in ranking methods, improving relevance for recom- in social networks for identifying communities with shared
mendation systems [2], [144]. interests or similar attributes [142].
CALL gds.louvain.stream(’myGraph’)
YIELD nodeId, communityId
VII. A PPLICATIONS OF G RAPH DATABASES IN DATA RETURN gds.util.asNode(nodeId).name AS Name,
S CIENCE communityId
ORDER BY communityId
A. Social Network Analysis (SNA)
Social Network Analysis (SNA) is an essential approach in 5) Applications of SNA: Social network analysis can greatly
data science for understanding the structure and dynamics of benefit fields such as marketing, epidemiology, and orga-
social interactions. By analyzing the patterns and relationships nizational management by providing insights into network
within social networks, SNA can help identify key influencers structures and behaviors. For example, influencer identification
and discover community clusters. Neo4j, a graph database plat- supports marketing efforts by targeting users with high impact,
form, provides powerful tools for conducting SNA, including while cluster detection facilitates community building and
influencer detection and community clustering. content recommendation [142].

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

B. Recommender Systems with Graphs C. Anomaly Detection Using Graphs


1) Graph-Based Collaborative Filtering (CF): Graph- Graph-based anomaly detection is crucial for identifying
based collaborative filtering (CF) leverages user-item interac- irregular patterns in network structures, particularly in fraud
tions in graph structures to enhance recommender systems. By detection, where suspicious transactions deviate from normal
modeling users and items as nodes and their relationships (e.g., patterns.
ratings or interactions) as edges, graph-based CF effectively
1) Graph-Based Anomaly Detection Techniques: Anomaly
captures complex, multi-level connections in data, addressing
detection in graphs involves analyzing network structures to
data sparsity and cold-start issues found in traditional CF
identify outlier nodes or edges.
methods.
a) Graph-Based Collaborative Filtering Methods: • Structural Anomaly Detection: Identifies nodes or
• Basic Graph-Based CF: Represents user-item interac- edges with unusual degrees or connection patterns, often
tions in a bipartite graph and applies neighborhood-based signaling fraudulent behavior [99].
methods to identify similarities [111]. • Community-Based Detection: Detects anomalies within
• Graph Signal Processing for CF: Integrates graph signal communities, identifying outliers in otherwise cohesive
processing to refine CF models, treating user-item graphs groups [21].
as signal domains [50]. 2) Graph Neural Networks (GNNs): GNNs are powerful
• Graph-Aware Collaborative Filtering: Incorporates tools for anomaly detection, capturing both structure and
both user-item bipartite graphs and knowledge graphs to attributes of nodes. For example, [95] reviewed GNN ap-
refine representations [81]. plications in financial fraud detection, where GNNs identify
• Graph Neural Networks (GNN) for CF: Uses GNNs complex anomalies across transaction networks by aggregating
to apply neighborhood aggregation, allowing the model neighborhood information.
to learn representations from multi-hop neighbors [48], 3) Case Study: Fraud Detection with Graph Analysis:
[44]. In financial networks, fraud detection can be achieved by
b) Implementation with Neo4j: Neo4j can be used representing entities as nodes and transactions as edges.
to build graph-based recommendation engines by executing a) Data Fusion and Graph Analysis: A study by [15]
queries to identify similar items or recommend items based used data fusion and graph analysis to detect fraudulent
on user connections: transactions by integrating transaction metadata with
MATCH (u:User)-[r:RATED]->(m:Movie) connection patterns.
WITH u, collect(m) AS movies
MATCH (u2:User)-[r:RATED]->(m2:Movie)
b) Cypher Query in Neo4j for Fraud Detection:
WHERE u <> u2 AND m2 IN movies
RETURN u2.name AS SimilarUser, count(*) AS
SharedInterests
ORDER BY SharedInterests DESC MATCH (a:Account)-[t:TRANSFER]->(b:Account)
LIMIT 5 WHERE t.amount > 10000 AND a.region <> b.
region
Listing 43. Building a recommendation engine in Neo4j RETURN a, b, t.amount

Expected Output This query identifies large transactions between accounts in


different regions, often associated with money laundering or
SimilarUser SharedInterests fraud.
Alice 5 c) Graph Neural Networks with Self-Attention: [75] ap-
Bob 4 plied a GNN with self-attention to detect fraud in electronic
Carol 3 payment systems, combining transaction characteristics with
David 3 network structure for improved detection.
Eve 2 4) Benefits of Graph-Based Anomaly Detection: Graph-
based anomaly detection offers distinct advantages:
This Cypher query identifies users with shared interests by • Handling Complex Relationships: Graphs naturally rep-
counting common items they’ve rated, aiding in recommend- resent relationships, making fraud detection more accu-
ing items based on similar user profiles. rate in highly interconnected networks.
2) Applications of Graph-Based CF: Graph-based CF is • Adaptability to Dynamic Data: Graph databases like
widely used in e-commerce, content streaming, and social Neo4j support real-time anomaly detection, adapting as
media due to its flexibility and accuracy in handling diverse new data arrives.
data types. This approach allows CF systems to incorporate • Enhanced Accuracy with Machine Learning: GNNs
heterogeneous data, such as social connections or contextual and other advanced models improve detection precision
knowledge, which improves recommendation quality and ad- by leveraging complex patterns in structure and attributes
dresses data sparsity [96]. [122].

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

VIII. W ORKING WITH L ARGE G RAPHS AND 1) Data Loading in Neo4j: Efficient data loading is crucial
O PTIMIZATION T ECHNIQUES in Neo4j, especially for large applications involving high
A. Handling Large Graphs in Python and Neo4j volumes of interconnected data. Neo4j utilizes batch data
ingestion and parallel processing to enhance data import speed.
For handling large graphs in Python and Neo4j, memory For example, large-scale biological networks benefit from
optimization is essential to manage data efficiently, especially Neo4j’s efficient data loading capabilities [36].
given the complex and often irregular structures of large
2) Query Optimization and Execution: Neo4j provides op-
graphs.
timized query execution for handling complex graph patterns.
1) Memory Optimization Techniques:
Its Cypher query language enables efficient traversal and
• Compressed Data Structures: Techniques like com- pattern matching, and its caching mechanism reduces database
pressed adjacency lists and variable-byte encoding can access during high-frequency queries. A comparative study by
reduce memory usage by up to 5x while preserving [31] found that Neo4j outperformed relational databases like
access efficiency. By compressing edge weights and node MySQL in query execution time, demonstrating its effective-
attributes, these methods minimize memory demands, ness for social network analysis and recommendation systems.
particularly useful in Python-based systems like Net- a) Cypher Query Example::
workX [76].
• Graph Partitioning: This method breaks large graphs
into smaller subgraphs, each fitting into memory indi- MATCH (p:Person)-[:FRIEND]->(f:Person)
WHERE p.age > 30
vidually. The GraphH system, for example, uses an edge RETURN f.name
cache mechanism to reduce disk I/O overhead, optimizing
memory for large graphs, particularly when partitioning This query finds friends of people over 30, showcasing Neo4j’s
nodes and edges in Neo4j [118]. efficiency in traversing relationships quickly.
• Vertex Merging and Compression: Techniques like 3) Data Management and Scaling: Neo4j leverages phys-
Vertex Merging (VM) group similar vertices to reduce ical RAM to enhance data access speeds and integrates disk-
redundant data processing. Aggressive vertex merging based storage when datasets exceed memory capacity. A
further optimizes memory by minimizing dependencies study comparing Neo4j and Apache Spark highlighted Neo4j’s
between data points, speeding up execution time [26]. capacity to manage large data until reaching memory limits,
2) Advanced Graph Libraries and Frameworks: at which point Spark may take over for distributed processing
• Graph-XLL: This library is optimized for large graphs needs [12].
on consumer-grade machines, significantly reducing 4) Applications and Benefits in Large-Scale Environments:
memory requirements. It offers scalable graph analytics Graph databases excel in fields such as social networks,
that helps prevent out-of-memory errors common with healthcare, and bioinformatics, where data relationships are
NetworkX in Python [133]. complex and flexible querying is required. For instance, Neo4j
• Neo4j Memory Management: Neo4j uses memory map- is used in bioinformatics to manage biomolecular pathway
ping and efficient page caching to handle large graphs. data, achieving up to 93% faster query performance than
Its graph engine manages data that exceeds memory relational databases [36]. Neo4j’s support for Enhanced Entity-
capacity through optimized I/O management, supporting Relationship (EER) models further validates its scalability
bulk ingestion with near-storage accelerators [61]. [125].
3) Out-of-Core Processing and Specialized Hardware:
For graphs that exceed in-memory capabilities, out-of-core C. Parallel Processing and Performance Optimization
processing enables systems to offload data to disk, processing
Parallel processing in Neo4j, combined with code optimiza-
only necessary portions in memory. GraphMP, for instance,
tion techniques, significantly enhances performance for large-
uses a vertex-centric sliding window model and selective
scale applications requiring efficient resource utilization and
scheduling to minimize memory consumption [118].
rapid query execution.
Specialized hardware, like FPGA boards, can handle mem-
1) Parallel Processing in Neo4j: Neo4j supports parallel
ory bottlenecks through parallel processing, embedding graph
processing to improve data loading, querying, and transac-
structures into silicon to reduce memory access latency [94].
tion handling. Parallel Cypher Execution allows Neo4j to
These techniques support large-scale applications like social
split queries into sub-tasks processed concurrently, increasing
network analysis, recommendation systems, and fraud detec-
throughput and reducing query times. Neo4j also employs
tion.
multi-threading for batch imports and complex traversals, sup-
porting applications like fraud detection and recommendation
B. Graph Databases for Large-Scale Applications systems [78].
Graph databases like Neo4j are essential for large-scale a) Batch Loading with Parallel Processing::
applications, particularly when managing complex, intercon-
nected datasets where traditional databases struggle. Neo4j CALL apoc.periodic.iterate(
supports efficient data loading, querying, and management of "LOAD CSV WITH HEADERS FROM ’file:///
graph data. large_data.csv’ AS row RETURN row",

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

"CREATE (:Entity {id: row.id, name: row.name on large datasets [101]. For instance, the Pregel algorithm is ef-
, value: row.value})", fective for scalable computations in distributed environments,
{batchSize: 1000, parallel: true} making it suitable for projects that require extensive parallel
)
processing [57].
This command loads data in parallel with a batch size of 1000, 3) Graph Models and Optimization Techniques: Graph
reducing memory overhead and optimizing CPU utilization. models in engineering and economic planning represent com-
2) Code Optimization Techniques: Code optimization in plex relationships, with graph rewriting techniques optimizing
Neo4j involves tuning Cypher queries to reduce computational design processes. This method enables formalized rule-based
load and memory use. Techniques such as query restructuring transformations, supporting objective setting and algorithm
and index usage can drastically reduce execution times. For selection [66]. Additionally, data, information, and knowledge
instance, creating indexes on frequently queried properties graphs enhance technical and economic planning by aligning
enhances retrieval speed. software design objectives with implementation [114].
a) Optimization Example with Indexes:: 4) Applied Algorithm Strategies: For large-scale applica-
tions, the LAGraph project assembles verified algorithms into
a unified framework, aiding in efficient and predictable graph
CREATE INDEX FOR (n:Entity) ON (n.name)
processing, especially valuable during the planning phase [89].
This command sets up an index on the name property of Entity Predictive workload models, such as Graph-Optimizer, help
nodes, accelerating queries that filter by name. estimate performance needs to ensure selected algorithms align
3) Advanced Parallel Processing Techniques and Hardware with project objectives [127].
Utilization: For further efficiency, Neo4j can integrate with
specialized parallel processing architectures, such as multi- B. Implementing a Graph Database Solution (Recommender
GPU environments or FPGA boards. Studies show that com- System or Social Network Analysis Tool)
bining GPUs and FPGAs for graph data processing can yield
significant performance improvements [131]. To implement a graph database solution like a recommender
a) Parallelism in Graph Traversals:: Optimization al- system in Neo4j, particularly one integrated with machine
gorithms employing loop unrolling and dynamic scheduling learning, follow these steps:
enhance Neo4j’s graph traversal performance. Loop unrolling, 1) Data Collection and Preprocessing: Data preparation is
in particular, is effective for tasks like Sparse Matrix Multi- essential for effective graph representation. Collect data rele-
plications, applicable to graph traversal optimization [116]. vant to the recommendation criteria, such as user preferences
4) Performance Studies and Results: Recent studies have or historical interactions. Store this data as nodes (e.g., users,
demonstrated substantial performance improvements through items) and relationships (e.g., purchases, likes) in Neo4j. Data
parallelism and code optimization in Neo4j. For example, a can be ingested using Python or ETL processes and structured
hybrid parallel approach using OpenMP and MPI with Neo4j for graph processing [11].
has shown to accelerate processing times by up to 3.2 times 2) Data Modeling in Neo4j: Define the schema in Neo4j,
compared to conventional methods [18]. optimizing it for querying. For a recommender system, nodes
represent entities like users and products, while relationships
IX. B UILDING A R EAL -W ORLD G RAPH -BASED P ROJECT capture interactions. Neo4j’s Cypher query language enables
flexible schema design and data modification:
A. Planning a Graph-Based Project
CREATE (u:User {id: 1, name: "Alice"})
To plan a graph-based project effectively, attention must CREATE (p:Product {id: 101, name: "Product A"
be given to objective setting, project structure, and algorithm })
selection, which are crucial for optimizing project outcomes CREATE (u)-[:PURCHASED]->(p)
and resource utilization.
1) Objective Setting: Objective setting is foundational in This basic structure facilitates recommendations based on
graph-based projects, as it defines the structure and require- relationship patterns [49].
ments for the graph model. Objectives may include optimizing 3) Integrating Machine Learning Models: Integrate ma-
data processing speed, minimizing project costs, or maximiz- chine learning models that leverage the graph data. Algorithms
ing algorithm accuracy. Pareto optimization can be used to like collaborative filtering or deep learning methods such as
balance competing objectives, such as efficiency and accuracy, graph neural networks (GNNs) can enhance recommendations
by identifying solutions that achieve the best trade-offs [20]. by analyzing patterns in user interactions. Data-centric Graph
Additionally, graph-analytic models can aid in structuring ML techniques support improved graph representation and
project objectives and evaluating planning methods [123]. pattern recognition [141].
2) Algorithm Selection: Choosing the right algorithms is 4) Implementing Recommendation Algorithms in Neo4j:
critical for achieving project objectives. The selection process Neo4j supports in-database machine learning through plugins
should consider factors such as data structure, computational and procedures, such as implementing a Decision Tree model
complexity, and specific project goals. A machine learning- within the database. With the gds (Graph Data Science)
based algorithm selection approach can dynamically choose library functions, specific algorithms can be employed to
optimal graph partitioning strategies, enhancing performance achieve advanced recommendations.

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

C. Deploying Graph Applications 1) Knowledge Graphs in NLP: Knowledge graphs are


Deploying graph applications with user-friendly interfaces structured representations that capture entities as nodes and
requires prioritizing usability, interactivity, and scalability, relationships as edges, structuring semantic relationships ex-
especially when handling large datasets and complex analytics. tracted from unstructured text. Constructing knowledge graphs
1) Designing Interactive User Interfaces: User-friendly in- enhances data extraction for applications like information
terfaces broaden the accessibility of graph applications, fa- retrieval and semantic search. For instance, Text2NKG in-
cilitating exploratory and analytical tasks. For example, [35] troduces a fine-grained n-ary relation extraction framework,
showcases a 3D immersive environment for graph exploration, automating the capture of complex entity relations across
enhancing user engagement through virtual reality. These various contexts [83]. Additionally, knowledge graphs support
interfaces use intuitive controls like point-and-click or drag- feature extraction in text classification by integrating relational
and-drop, enabling non-technical users to navigate complex structures, as shown by the Microsoft Concept Graph [103].
graph data effortlessly. 2) Graph-Based Text Clustering: Graph-based text cluster-
2) Web-Based and Visual Tools for Deployment: Web- ing uses connectivity between entities or keywords to group
based platforms offer accessibility and interactivity for graph similar documents or text segments. Algorithms like CLIP
applications. [53] developed a web tool that allows users to use link features to enhance clustering quality in knowledge
compare graph embedding techniques for tasks like node clas- graphs, especially in scenarios with overlapping clusters [108].
sification, supporting non-experts in selecting suitable graph This approach benefits applications like social media analysis,
representations. This platform demonstrates how web-based where grouping text thematically provides valuable insights.
solutions can improve user experience by embedding graph 3) Relationship Extraction with Graph Structures: Rela-
algorithms and visualizations in a browser-friendly format. tionship extraction is essential for building knowledge graphs
3) Edge-Cloud Collaboration for Scalable Solutions: An and identifying connections between entities in texts. Graph-
edge-cloud framework optimizes graph processing for large- based frameworks often use Graph Neural Networks (GNNs)
scale deployments, enhancing performance and responsiveness or Graph Attention Networks to contextualize relationships,
while allowing distributed data storage. [143] presents an enabling more accurate capture of relational data. For example,
edge-cloud collaboration model, enabling graph applications sentential relation extraction is enhanced by dual heteroge-
to manage heavy computational loads at the cloud level while neous graph context selection, which leverages both graph
providing a responsive user experience on edge devices. structure and node features [135].
4) Declarative Graph Analytics with Gradoop: Gradoop,
an open-source framework, enables users to define and execute B. Graph Neural Networks (GNNs)
distributed graph analysis programs using a high-level lan-
guage. It supports pattern matching and graph grouping with Graph Neural Networks (GNNs) have become essential in
a user-friendly interface, reducing the need for advanced cod- network analysis, enabling the learning of representations from
ing skills, as highlighted by [60]. This approach streamlines graph-structured data. GNNs iteratively pass information (mes-
complex graph operations, facilitating efficient data handling sages) between nodes and edges, allowing each node to update
and scalability across large datasets. its representation based on neighboring nodes. This process
5) Implementing Real-Time Visualization Frameworks: produces learned embeddings that capture local and global
Real-time visualization is critical for user engagement, es- graph structure, supporting tasks such as node classification,
pecially for scenarios requiring dynamic data updates. [67] link prediction, and graph classification.
implemented a framework in RefactorErl for visualizing large 1) Basic Operation of GNNs: The core function of GNNs
Semantic Program Graphs in near real-time, helping users is based on a message-passing mechanism:
understand structural changes as they occur. This approach h(k) = Update(h(k−1) , Aggregate({h(k−1) : u ∈ N (v)}))
u
can also be applied to real-time graph applications, such as
social network monitoring. where h(k) is the hidden state (or embedding) of node v at
6) Deploying Customizable Graph Processing Tools: layer k, N (v) represents the neighbors of v, and functions
GraphScope, a Python-based engine for distributed graph like Aggregate and Update define how information from
processing, offers a customizable environment for large-scale neighboring nodes is combined. Graph Convolutional Net-
graph analytics [38]. It integrates with a Python API, allowing works (GCNs), a common GNN implementation, update node
users to configure and run graph algorithms interactively, features by averaging neighbors’ features weighted by edge
which enhances deployment flexibility for developers needing values [80], [93].
adaptable graph solutions. 2) GNN Applications in Network Analysis:
• Node Classification: GNNs are widely used in node
X. A DVANCED T OPICS AND F URTHER R EADING
classification tasks, where the goal is to predict labels
A. Graphs in Natural Language Processing (NLP) for nodes based on graph structure and node attributes.
In Natural Language Processing (NLP), graphs play a Applications include classifying users in social networks
significant role by organizing complex relationships between or detecting malicious nodes in cybersecurity [104].
entities, making tasks like knowledge graph construction, • Link Prediction: In link prediction, GNNs predict the
text clustering, and relationship extraction more efficient and likelihood of edges between nodes, enabling applications
accurate. like friend recommendations or collaboration suggestions.

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

DyGNN, a dynamic GNN model, captures temporal or sensor arrays. This approach benefits fields like telecom-
interactions for evolving graphs, making it suitable for munications, where efficient data representation and transmis-
predicting new connections over time [84]. sion are critical. Graph signal processing leverages graph-
• Graph Classification: Graph classification aims to cat- theoretical properties to enhance applications requiring high
egorize entire graphs, useful in domains like bioinfor- data integrity and efficiency [32].
matics, where GNNs can classify molecular structures.
Kernel-based GNNs (KGNNs) improve classification ac-
R EFERENCES
curacy by incorporating both labeled and unlabeled data
[59]. [1] Syed Minhal Abbas, Nadeem Javaid, Ahmad Taher Azar, Umar Qasim,
Zahoor Ali Khan, and Sheraz Aslam. Towards enhancing the robustness
3) Advanced GNN Architectures and Optimization: of scale-free iot networks by an intelligent rewiring mechanism.
Sensors, 22(7):2658, 2022.
• Graph Convolutional Networks (GCNs): GCNs extend [2] Atul Agnihotri, Maalti Puri, and Amit Mahajan. Iot based recent trends
convolution operations to graph structures, aggregating in engineering and its applications. 2024.
information from neighboring nodes. This architecture is [3] Gabriel E Aguilar-Pineda and L Olivares-Quiroz. Catalytic and binding
sites prediction in globular proteins through discrete markov chains and
widely used in applications requiring relational informa- network centrality measures. Physical Biology, 18(6):066002, 2021.
tion, such as traffic prediction and community detection. [4] Hugo A Akitaya, Maike Buchin, Bernhard Kilgus, Stef Sijben, and
• Graph Attention Networks (GATs): GATs enhance Carola Wenk. Distance measures for embedded graphs. Computational
GCNs by introducing attention mechanisms, allowing Geometry, 95:101743, 2021.
[5] Abdollah Alhevaz, Maryam Baghipur, Hilal A. Ganie, and Yilun Shang.
nodes to assign different weights to neighbors based on The generalized distance spectrum of the join of graphs. Symmetry,
relevance. This capability is useful in environments like 12(1):169, 2020.
financial transaction monitoring where data noise is a [6] Mohamed Alshaer and Paul Cotae. On identifying the critical nodes
and vulnerable edges for increasing network security. 2018.
concern [34]. [7] Mingqiang An, Yinan Zhang, Kinkar Ch Das, and Liming Xiong.
• Graph Neural Architecture Search (GraphNAS): Reciprocal degree distance and graph properties. Discrete Applied
GraphNAS automates GNN architecture design using Mathematics, 258:1–7, 2019.
[8] Renzo Angles, Harsh Thakkar, and Dominik Tomaszuk. Mapping rdf
reinforcement learning, optimizing network structures for databases to property graph databases. IEEE Access, 8:86091–86110,
tasks like node classification, reducing manual design 2020.
efforts [41]. [9] Sydney Anuyah and Sunandan Chakraborty. Can deep learning large
language models be used to unravel knowledge graph creation? In
Proceedings of the International Conference on Computing, Machine
Learning and Data Science, pages 1–6, 2024.
C. Future of Graph Theory in Data Science [10] Ghislain Auguste Atemezing. Empirical evaluation of a cloud-based
graph database: the case of neptune. In Knowledge Graphs and Seman-
Graph theory is evolving rapidly within data science, provid- tic Web: Third Iberoamerican Conference and Second Indo-American
ing innovative approaches for data representation, analysis, and Conference, KGSWC 2021, Kingsville, Texas, USA, November 22–24,
2021, Proceedings 3, pages 31–46. Springer, 2021.
prediction across various domains. Key trends include graph [11] Shaurya Bajaj and D Geraldine Bessie Amali. Species environmental
embedding, dynamic graph modeling, graph databases, and niche distribution modeling for panthera tigris tigris ‘royal bengal
graph signal processing. tiger’using machine learning. In Emerging Research in Computing,
Information, Communication and Applications: ERCICA 2018, Volume
1) Graph Embedding and Representation Learning: Graph 1, pages 251–263. Springer, 2019.
embedding transforms complex graph structures into vector [12] Ioannis Ballas, Vassilios Tsakanikas, Evaggelos Pefanis, and Vassilios
representations, allowing them to be processed by machine Tampakas. Assessing the computational limits of graphdbs’ engines-a
comparison study between neo4j and apache spark. In Proceedings
learning algorithms. This technique is essential in applications of the 24th Pan-Hellenic Conference on Informatics, pages 428–433,
like social network analysis, recommendation systems, and 2020.
biological data modeling, as embeddings retain relational and [13] Alexander Baumgartner, Temur Kutsia, Jordi Levy, and Mateu Villaret.
Term-graph anti-unification. 2018.
structural graph information [22]. [14] Rajat Belgundi, Yash Kulkarni, and Balaso Jagdale. Analysis of
2) Temporal and Dynamic Graph Modeling: Analyzing native multi-model database using arangodb. In Proceedings of Third
time-dependent or evolving graphs is essential for applications International Conference on Sustainable Expert Systems: ICSES 2022,
pages 923–935. Springer, 2023.
requiring real-time monitoring, such as traffic flow analysis or [15] Valerio Bellandi and Stefano Siccardi. Data fusion and graph analysis
financial forecasting. Advances in temporal pattern recognition in fraud transaction detection: walkthrough of a case study. In 2022
within graphs enable models that capture dynamic behavior, IEEE International Conference on Big Data (Big Data), pages 4601–
4605. IEEE, 2022.
supporting time-sensitive network analysis [28]. [16] Maciej Besta, Robert Gerstenberger, Emanuel Peter, Marc Fischer,
3) Graph Databases and Big Data Modeling: The shift Michał Podstawski, Claude Barthels, Gustavo Alonso, and Torsten
towards graph databases over relational databases is driven Hoefler. Demystifying graph databases: Analysis and taxonomy of
data organization, system designs, and graph queries. ACM Computing
by the need for efficient storage and querying of complex Surveys, 56(2):1–40, 2023.
relationships in large datasets. Graph databases, like Neo4j, are [17] Uzair Aslam Bhatti, Hao Tang, Guilu Wu, Shah Marjan, and Aamir
widely used in fraud detection, biological network analysis, Hussain. Deep learning with graph convolutional networks: An
overview and latest applications in computational intelligence. Inter-
and social media analytics due to their performance and national Journal of Intelligent Systems, 2023(1):8342104, 2023.
flexibility [102]. [18] Plamenka Borovska, Veska Gancheva, and Ivailo Georgiev. Hybrid par-
4) Graph Signal Processing: Graph signal processing ap- allel implementation of multiple sequence alignment software clustalw
on intel xeon phi. In Sixth International Conference on Advances in
plies signal processing techniques to graph-structured data, Computing, Electronics and Communication-ACEC 2017, Rome, Italy,
enabling analysis on irregular domains like social networks 9-10 December, 2017.

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

[19] Marine Boucher. The influence of color, color as influence: structuring [39] Aftab Farooq, Gulraiz Javaid Joyia, Muhammad Uzair, and Usman
tools for the design of a graphic design project. Akram. Detection of influential nodes using social networks analysis
[20] Yu V Bugaeev, OV Avseeva, LA Korobova, and I Yu Shurupova. based on network metrics. In 2018 international conference on
Algorithm for solving multicriteria problem of appointments on the computing, mathematics and engineering technologies (icomet), pages
networks. Proceedings of the Voronezh State University of Engineering 1–6. IEEE, 2018.
Technologies, 79(4):71–74, 2017. [40] Chen Gao, Xiang Wang, Xiangnan He, and Yong Li. Graph neural
[21] Hilmi Aziz Bukhori and Rinaldi Munir. Inductive link prediction networks for recommender system. In Proceedings of the fifteenth
banking fraud detection system using homogeneous graph-based ma- ACM international conference on web search and data mining, pages
chine learning model. In 2023 IEEE 13th Annual Computing and 1623–1625, 2022.
Communication Workshop and Conference (CCWC), pages 0246–0251. [41] Yang Gao, Hong Yang, Peng Zhang, Chuan Zhou, and Yue Hu. Graph-
IEEE, 2023. nas: Graph neural architecture search with reinforcement learning.
[22] Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. A arXiv preprint arXiv:1904.09981, 2019.
comprehensive survey of graph embedding: Problems, techniques, and [42] Loukas Georgiadis, Giuseppe F Italiano, and Nikos Parotsidis. Incre-
applications. IEEE transactions on knowledge and data engineering, mental strong connectivity and 2-connectivity in directed graphs. In
30(9):1616–1637, 2018. Latin American Symposium on Theoretical Informatics, pages 529–543.
[23] Quentin Cappart, Didier Chételat, Elias B Khalil, Andrea Lodi, Christo- Springer, 2018.
pher Morris, and Petar Veličković. Combinatorial optimization and [43] Timo Gervens and Martin Grohe. Graph similarity based on matrix
reasoning with graph neural networks. Journal of Machine Learning norms. arXiv preprint arXiv:2207.00090, 2022.
Research, 24(130):1–61, 2023. [44] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals,
[24] Intars Česlis and Sergejs Kodors. Efficiency comparison of pathfinding and George E Dahl. Neural message passing for quantum chemistry.
algorithms a* and dijkstra’s in two dimensional grid. In HUMAN. In International conference on machine learning, pages 1263–1272.
ENVIRONMENT. TECHNOLOGIES. Proceedings of the Students Inter- PMLR, 2017.
national Scientific and Practical Conference, number 24, pages 36–39, [45] Orazio Giustolisi, Luca Ridolfi, and Antonietta Simone. b embedding
2020. vertex intrinsic relevance in network analysis: the case of betweenness.
[25] Jiafeng Cheng, Qianqian Wang, Zhiqiang Tao, Deyan Xie, and CoRR, 2019.
Quanxue Gao. Multi-view attribute graph convolution networks for [46] Nirit Glazer. Challenges with graph interpretation: A review of the
clustering. In Proceedings of the twenty-ninth international conference literature. Studies in science education, 47(2):183–210, 2011.
on international joint conferences on artificial intelligence, pages [47] Darian Horacio Grass-Boada, Airel Pérez-Suárez, Leticia Arco, Rafael
2973–2979, 2021. Bello, and Alejandro Rosete. Overlapping community detection using
[26] Wei Cheng, Chun-Feng Wu, Yuan-Hao Chang, and Ing-Chao Lin. multi-objective approach and rough clustering. In Rough Sets: Inter-
Graphrc: Accelerating graph processing on dual-addressing memory national Joint Conference, IJCRS 2020, Havana, Cuba, June 29–July
with vertex merging. In Proceedings of the 41st IEEE/ACM Interna- 3, 2020, Proceedings, pages 416–431. Springer, 2020.
tional Conference on Computer-Aided Design, pages 1–9, 2022. [48] Tajuddeen Rabiu Gwadabe and Ying Liu. Improving graph neural
[27] Nur Nasuha Daud, Siti Hafizah Ab Hamid, Muntadher Saadoon, network for session-based recommendation system via non-sequential
Firdaus Sahran, and Nor Badrul Anuar. Applications of link prediction interactions. Neurocomputing, 468:111–122, 2022.
in social networks: A review. Journal of Network and Computer
[49] Shagufta Henna and Shyam Krishnan Kalliadan. Enterprise analytics
Applications, 166:102716, 2020.
using graph database and graph-based deep learning. arXiv preprint
[28] Pietro Daverio, Hassan Nazeer Chaudhry, Alessandro Margara, and
arXiv:2108.02867, 2021.
Matteo Rossi. Temporal pattern recognition in graph data structures.
[50] Weiyu Huang, Antonio G Marques, and Alejandro Ribeiro. Collabora-
In 2021 IEEE International conference on big data (Big Data), pages
tive filtering via graph signal processing. In 2017 25th European signal
2753–2763. IEEE, 2021.
processing conference (EUSIPCO), pages 1094–1098. IEEE, 2017.
[29] Xiaolong Deng, Jiayu Zhai, Tiejun Lv, and Luanyu Yin. Efficient vector
influence clustering coefficient based directed community detection [51] Isa Inuwa-Dutse, Mark Liptrott, and Ioannis Korkontzelos. A multi-
method. IEEE Access, 5:17106–17116, 2017. level clustering technique for community detection. Neurocomputing,
[30] TP Deshmukh, BR Bamnote, and SW Ahmad. Review paper oncom- 441:64–78, 2021.
plexity reduction in relational database using neo4j graph database. [52] Lina Elsherif Ismail and Waldemar Karwowski. A graph theory-based
[31] Thi-Thu-Trang Do, Thai-Bao Mai-Hoang, Van-Quyet Nguyen, and modeling of functional brain connectivity based on eeg: a systematic
Quyet-Thang Huynh. Query-based performance comparison of graph review in the context of neuroergonomics. Ieee Access, 8:155103–
database and relational database. In Proceedings of the 11th Inter- 155135, 2020.
national Symposium on Information and Communication Technology, [53] Ilinka Ivanoska, Martin Milenkoski, Slobodan Kalajdziski, and Kire
pages 375–381, 2022. Trivodaliev. Web tool for graph embeddings representation techniques
[32] Xiaowen Dong, D. Thanou, M. Rabbat, and P. Frossard. Learning evaluation. In 2019 42nd International Convention on Information
graphs from data: A signal representation perspective. IEEE Signal and Communication Technology, Electronics and Microelectronics
Processing Magazine, 2018. (MIPRO), pages 983–988. IEEE, 2019.
[33] Ezequiel Dratman, Luciano N Grippo, Verónica Moyano, and Adrián [54] Meenal Jabde. Learning graph databases: Neo4j an overview.
Pastine. On the rank of the distance matrix of graphs. Applied [55] Maryam Jafarpour, Mohammad Shekaramiz, Abolfazl Javan, and Ali
Mathematics and Computation, 433:127394, 2022. Moeini. Building graphs with maximum connectivity. In 2020
[34] Chi Thang Duong, Thanh Dat Hoang, Ha The Hien Dang, Quoc Intermountain Engineering, Technology and Computing (IETC), pages
Viet Hung Nguyen, and Karl Aberer. On node features for graph neural 1–5. IEEE, 2020.
networks. arXiv preprint arXiv:1911.08795, 2019. [56] Weiwei Jiang and Jiayun Luo. Graph neural network for traffic
[35] Ugo Erra, Delfina Malandrino, and Luca Pepe. A methodological eval- forecasting: A survey. Expert systems with applications, 207:117921,
uation of natural user interfaces for immersive 3d graph explorations. 2022.
Journal of Visual Languages & Computing, 44:13–27, 2018. [57] Zhihua Jiang and Dongning Rao. Scalable and optimal planning based
[36] Antonio Fabregat, Florian Korninger, Guilherme Viteri, Konstantinos on pregel. Concurrency and Computation: Practice and Experience,
Sidiropoulos, Pablo Marin-Garcia, Peipei Ping, Guanming Wu, Lincoln 31(7):e4966, 2019.
Stein, Peter D’eustachio, and Henning Hermjakob. Reactome graph [58] Shuting Jin, Xiangxiang Zeng, Feng Xia, Wei Huang, and Xiangrong
database: Efficient access to complex pathway data. PLoS computa- Liu. Application of deep learning methods in biological networks.
tional biology, 14(1):e1005968, 2018. Briefings in bioinformatics, 22(2):1902–1917, 2021.
[37] Raffaele Falsaperla, Giovanna Vitaliti, Simona Domenica Marino, [59] Wei Ju, Junwei Yang, Meng Qu, Weiping Song, Jianhao Shen, and
Andrea Domenico Praticò, Janette Mailo, Michela Spatuzza, Ming Zhang. Kgnn: Harnessing kernel-based networks for semi-
Maria Roberta Cilio, Rosario Foti, and Martino Ruggieri. Graph theory supervised graph classification. In Proceedings of the fifteenth ACM
in paediatric epilepsy: a systematic review. Dialogues in Clinical international conference on web search and data mining, pages 421–
Neuroscience, 23(1):3–13, 2021. 429, 2022.
[38] Wenfei Fan, Tao He, Longbin Lai, Xue Li, Yong Li, Zhao Li, [60] Martin Junghanns, Max Kießling, Niklas Teichmann, Kevin Gómez,
Zhengping Qian, Chao Tian, Lei Wang, Jingbo Xu, et al. Graphscope: André Petermann, and Erhard Rahm. Declarative and distributed
a unified engine for big graph processing. Proceedings of the VLDB graph analytics with gradoop. Proceedings of the VLDB Endowment,
Endowment, 14(12):2879–2892, 2021. 11(12):2006–2009, 2018.

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

[61] Seongyoung Kang and Sang-Woo Jun. Near-storage accelerator for [81] Lianjie Long, Yunfei Yin, and Faliang Huang. Graph-aware collabo-
bulk graph ingestion. In 2023 IEEE International Parallel and rative filtering for top-n recommendation. In 2021 International Joint
Distributed Processing Symposium Workshops (IPDPSW), pages 101– Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
104. IEEE, 2023. [82] Sileshi Lulseged, Sanni Ali, and Girmay Medhin. Emj series on
[62] VN Kasyanov, AM Merculov, and TA Zolotuhin. A circular layout statistics and methods part iv: Presenting and summarizing data using
algorithm for attributed hierarchical graphs with ports. In Journal of graphical tools. Ethiopian Medical Journal, 56(4), 2018.
Physics: Conference Series, volume 2099, page 012051. IOP Publish- [83] Haoran Luo, Yuhao Yang, Tianyu Yao, Yikai Guo, Zichen Tang, Wentai
ing, 2021. Zhang, Kaiyang Wan, Shiyao Peng, Meina Song, Wei Lin, et al.
[63] Ethan Kerzner, Alexander Lex, Crystal Lynn Sigulinsky, Timothy Text2nkg: Fine-grained n-ary relation extraction for n-ary relational
Urness, Bryan W Jones, Robert E Marc, and Miriah Meyer. Graffinity: knowledge graph construction. arXiv preprint arXiv:2310.05185, 2023.
Visualizing connectivity in large graphs. In Computer Graphics Forum, [84] Yao Ma, Ziyi Guo, Zhaocun Ren, Jiliang Tang, and Dawei Yin. Stream-
volume 36, pages 251–260. Wiley Online Library, 2017. ing graph neural networks. In Proceedings of the 43rd international
[64] Kin Lok Keung, Liqiao Xia, Carman KM Lee, and CY Leung. ACM SIGIR conference on research and development in information
A shortest path graph attention network and non-traditional multi- retrieval, pages 719–728, 2020.
deep layouts in robotic mobile fulfillment system. In 2022 IEEE [85] Ranjan Maity and Samit Bhattacharya. Relating aesthetics of the gui
International Conference on Industrial Engineering and Engineering text elements with readability using font family. In Proceedings of
Management (IEEM), pages 0655–0659. IEEE, 2022. the 2018 ACM Companion International Conference on Interactive
[65] Jongkwang Kim and Thomas Wilhelm. What is a complex graph? Surfaces and Spaces, pages 63–68, 2018.
Physica A: Statistical Mechanics and its Applications, 387(11):2637– [86] Abdul Majeed and Ibtisam Rauf. Graph theory: A comprehensive
2652, 2008. survey about graph theory applications in computer science and social
[66] Lothar Kolbeck, Simon Vilgertshofer, Jimmy Abualdenien, and André networks. Inventions, 5(1):10, 2020.
Borrmann. Graph rewriting techniques in engineering design. Frontiers [87] Tabitha Agnes Mangam and Joseph Varghese Kureethara. Diametral
in built environment, 7:815153, 2022. paths in total graphs of complete graphs, complete bipartite graphs and
[67] Mátyás Komáromi, BOZÓ István, and TÓTH Melinda. An efficient wheels. Int. J. Civil Eng. Tech, 8(5):1212–1219, 2017.
graph visualisation framework for refactorerl. Studia Universitatis [88] Antonio Maria Fiscarelli, Matthias R Brust, Grégoire Danoy, and
Babes, -Bolyai Informatica, pages 21–36, 2018. Pascal Bouvry. A vertex-similarity clustering algorithm for community
[68] Nicole Rachel Koshy, Anshuman Dixit, Siddhi Shrikant Jadhav, Arun V detection. Journal of Information and Telecommunication, 4(1):36–50,
Penmatsa, Sagar V Samanthapudi, Mothi Gowtham Ashok Kumar, 2020.
Sydney Oghenetega Anuyah, Gourav Vemula, Patricia Snell Herzog, [89] Tim Mattson, Timothy A Davis, Manoj Kumar, Aydin Buluc, Scott
and Davide Bolchini. Data-to-question generation using deep learning. McMillan, José Moreira, and Carl Yang. Lagraph: A community effort
In 2023 4th International Conference on Big Data Analytics and to collect graph algorithms built on top of the graphblas. In 2019
Practices (IBDAP), pages 1–6. IEEE, 2023. IEEE International Parallel and Distributed Processing Symposium
[69] Mikaela Koutrouli, Evangelos Karatzas, David Paez-Espino, and Geor- Workshops (IPDPSW), pages 276–284. IEEE, 2019.
gios A Pavlopoulos. A guide to conquer the biological network era [90] Konstanitnos Mavrogiorgos, Athanasios Kiourtis, Argyro Mavrogior-
using graph theory. Frontiers in bioengineering and biotechnology, gou, and Dimosthenis Kyriazis. A comparative study of mongodb,
8:34, 2020. arangodb and couchdb for big data storage. In Proceedings of the 2021
[70] Suchi Kumari, Abhishek Saroha, and Anurag Singh. Efficient edge 5th International Conference on Cloud and Big Data Computing, pages
rewiring strategies for enhancement in network capacity. Physica A: 8–14, 2021.
Statistical Mechanics and its Applications, 545:123552, 2020. [91] Reza Mirjalili, Hojjat Barati, and Anil Yazici. Resilience analysis of
[71] Kazuhiro Kurita, Kunihiro Wasa, A. Conte, Hiroki Arimura, and new york city transportation network after snow storms. Transportation
T. Uno. Efficient enumeration of subgraphs and induced subgraphs research record, 2677(1):694–707, 2023.
with bounded girth. In European Symposium on Algorithms, 2018. [92] Ruchi Mittal and MPS Bhatia. Classifying the influential individuals
[72] Asi Kuushalie, Gayatri Yerukola, et al. Efficiency analysis of con- in multi-layer social networks. International Journal of Electronics,
ventional and weighted grid-based pathfinding algorithms: A perfor- Communications, and Measurement Engineering (IJECME), 8(1):21–
mance comparative study. In 2024 Third International Conference on 32, 2019.
Electrical, Electronics, Information and Communication Technologies [93] Nenad Mladenović and Pierre Hansen. Variable neighborhood search.
(ICEEICT), pages 1–6. IEEE, 2024. Computers & operations research, 24(11):1097–1100, 1997.
[73] Ricky Laishram, Ahmet Erdem Sariyüce, Tina Eliassi-Rad, Ali Pinar, [94] Andrey Mokhov, Alessandro De Gennaro, Ghaith Tarawneh, Jonny
and Sucheta Soundarajan. Measuring and improving the core resilience Wray, Georgy Lukyanov, Sergey Mileiko, Joe Scott, Alex Yakovlev,
of networks. In Proceedings of the 2018 World Wide Web Conference, and Andrew Brown. Language and hardware acceleration backend
pages 609–618, 2018. for graph processing. In 2017 Forum on Specification and Design
[74] Gang Li, Yonghua Jiang, Weidong Jiao, Wanxiu Xu, Shan Huang, Languages (FDL), pages 1–7. IEEE, 2017.
Zhao Gao, Jianhua Zhang, and Chengwu Wang. The maximum [95] Soroor Motie and Bijan Raahemi. Financial fraud detection using
eigenvalue of the brain functional network adjacency matrix: Meaning graph neural networks: A systematic review. Expert Systems with
and application in mental fatigue evaluation. Brain sciences, 10(2):92, Applications, 240:122156, 2024.
2020. [96] Ruihui Mu, Xiaoqin Zeng, and Jiying Zhang. Heterogeneous in-
[75] Min Li, Mengjie Sun, Qianlong Liu, and Yumeng Zhang. Fraud formation fusion based graph collaborative filtering recommendation.
detection based on graph neural networks with self-attention. In 2021 Intelligent Data Analysis, 27(6):1595–1613, 2023.
2nd International Seminar on Artificial Intelligence, Networking and [97] Raghad Mustafa, Ahmed M Ali, and AbdulSattar M Khidhir. M n–
Information Technology (AINIT), pages 349–353. IEEE, 2021. polynomials of some special graphs. Iraqi Journal of Science, pages
[76] Panagiotis Liakos, Katia Papakonstantinopoulou, and Alex Delis. Re- 1986–1993, 2021.
alizing memory-optimized distributed graph processing. IEEE Trans- [98] Giulia Muzio, Leslie O’Bray, and Karsten Borgwardt. Biological
actions on Knowledge and Data Engineering, 30(4):743–756, 2017. network analysis with deep learning. Briefings in bioinformatics,
[77] Chun-Cheng Lin, Weidong Huang, Wan-Yu Liu, and Chang-Yu Chen. 22(2):1515–1530, 2021.
On aesthetics for user-sketched layouts of vertex-weighted graphs. [99] Thanh Tam Nguyen, Thanh Cong Phan, Hien Thu Pham, Thanh Thi
Journal of Visualization, 24:157–171, 2021. Nguyen, Jun Jo, and Quoc Viet Hung Nguyen. Example-based
[78] Dan Liu and Ming Ming Li. A performance optimization scheme for explanations for streaming fraud detection on graphs. Information
migrating hive data to neo4j database. In 2018 International Symposium Sciences, 621:319–340, 2023.
on Computer, Consumer and Control (IS3C), pages 1–5. IEEE, 2018. [100] Silvia Noschese and Lothar Reichel. Network analysis with the aid of
[79] Weijie Liu, Hui Qian, Chao Zhang, Jiahao Xie, Zebang Shen, and the path length matrix. Numerical Algorithms, 95(1):451–470, 2024.
Nenggan Zheng. From one to all: Learning to match heterogeneous and [101] YoungJoon Park, DongKyu Lee, and Tien-Cuong Bui. Machine
partially overlapped graphs. In Proceedings of the AAAI Conference learning-based selection of graph partitioning strategy using the charac-
on Artificial Intelligence, volume 36, pages 4109–4119, 2022. teristics of graph data and algorithm. arXiv preprint arXiv:2209.04137,
[80] Xingyu Liu, Juan Chen, and Quan Wen. A survey on graph 2022.
classification and link prediction based on gnn. arXiv preprint [102] Subrata Paul, Anirban Mitra, and Chandan Koner. A review on graph
arXiv:2307.00865, 2023. database and its representation. In 2019 International Conference on

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin


Understanding Graph Databases: A Comprehensive Tutorial and Survey

Recent Advances in Energy-efficient Computing and Communication [124] Sunil Kumar Raghavan Unnithan and Kannan Balakrishnan. Between-
(ICRAECC), pages 1–5. IEEE, 2019. ness centrality in convex amalgamation of graphs. Journal of Algebra
[103] Nela Petrželková, Blaž Škrlj, and Nada Lavrač. Knowledge graph Combinatorics Discrete Structures and Applications, 6(1):21–38, 2019.
aware text classification. 2020. [125] Anikó Vágner. Store and visualize eer in neo4j. In Proceedings of
[104] Samuel Pfrommer, Alejandro Ribeiro, and Fernando Gama. Discrim- the 2nd International Symposium on Computer Science and Intelligent
inability of single-layer graph neural networks. In ICASSP 2021- Control, pages 1–6, 2018.
2021 IEEE International Conference on Acoustics, Speech and Signal [126] Diego Vallarino. Dynamic portfolio rebalancing: A hybrid new
Processing (ICASSP), pages 8508–8512. IEEE, 2021. model using gnns and pathfinding for cost efficiency. arXiv preprint
[105] Kornelije Rabuzin, Maja Cerjan, and Snježana Križanić. Supporting arXiv:2410.01864, 2024.
data types in neo4j. In European Conference on Advances in Databases [127] Ana Lucia Varbanescu and Andrea Bartolini. Graph-optimizer: To-
and Information Systems, pages 459–466. Springer, 2022. wards predictable large-scale graph processing workloads. In Compan-
[106] Antonia Radoš. Fruchterman-Reingold Algorithm for Force-Directed ion of the 2023 ACM/SPEC International Conference on Performance
Graph Drawing. PhD thesis, University of Zagreb, Faculty of Electrical Engineering, pages 255–256, 2023.
Engineering and Computing, 2019. [128] Petar Veličković. Everything is connected: Graph neural networks.
Current Opinion in Structural Biology, 79:102538, 2023.
[107] AB Sadavare and RV Kulkarni. A review of application of graph
[129] Mathilde Vernet, Yoann Pigné, and Eric Sanlaville. A study of connec-
theory for network. International Journal of Computer science and
tivity on dynamic graphs: computing persistent connected components.
Information technologies, 3(6):5296–5300, 2012.
4OR, 21(2):205–233, 2023.
[108] Alieh Saeedi, Eric Peukert, and Erhard Rahm. Using link features [130] Tomáš Vetrı́k. General degree distance of graphs. Journal of Algebra
for entity clustering in knowledge graphs. In European Semantic Web Combinatorics Discrete Structures and Applications, 8(2):107–118,
Conference, pages 576–592. Springer, 2018. 2021.
[109] Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and [131] Guyue Wang, Koichi Wada, and Shinichi Yamagiwa. Optimization in
M Tamer Özsu. The ubiquity of large graphs and surprising challenges the parallelism extraction algorithm with spanning tree on a multi-
of graph processing: extended survey. The VLDB journal, 29:595–618, gpu environment. IEEJ Transactions on Electrical and Electronic
2020. Engineering, 14(6):862–869, 2019.
[110] Benjamin Sanchez-Lengeling, Emily Reif, Adam Pearce, and Alexan- [132] Peter Wills and François G Meyer. Metrics for graph comparison: a
der B Wiltschko. A gentle introduction to graph neural networks. practitioner’s guide. Plos one, 15(2):e0228728, 2020.
Distill, 6(9):e33, 2021. [133] Jian Wu, Venkatesh Srinivasan, and Alex Thomo. Graph-xll: a graph
[111] Mittu Satheesh and G Remya. Collaborative filtering using graph kernel library for extra large graph analytics on a single machine. In 2019
and boosting. In 2017 International Conference on Computational 10th International Conference on Information, Intelligence, Systems
Intelligence in Data Science (ICCIDS), pages 1–4. IEEE, 2017. and Applications (IISA), pages 1–7. IEEE, 2019.
[112] Jens Schrodt, Aleksei Dudchenko, Petra Knaup-Gregori, and Matthias [134] Shiwen Wu, Fei Sun, Wentao Zhang, Xu Xie, and Bin Cui. Graph
Ganzinger. Graph-representation of patient data: a systematic literature neural networks in recommender systems: a survey. ACM Computing
review. Journal of medical systems, 44(4):86, 2020. Surveys, 55(5):1–37, 2022.
[113] V Thamarai Selvi and Pandian Vaidhyanathan. Square distance in [135] Bo Xu, Nian Liu, Luyi Cheng, Shizhou Huang, Shouang Wei, Ming Du,
graphs. International Journal of Information Technology, Research and Hui Song, and Hongya Wang. Knowledge graph enhanced sentential
Applications, 2(3):5–11, 2023. relation extraction via dual heterogeneous graph context selection. In
[114] Lixu Shao, Yucong Duan, Xiaobing Sun, Quan Zou, Rongqi Jing, and 2023 International Joint Conference on Neural Networks (IJCNN),
Jiami Lin. Bidirectional value driven design between economical plan- pages 1–7. IEEE, 2023.
ning and technical implementation based on data graph, information [136] Jie Xue, Huiqiu Lin, and Jinlong Shu. The algebraic connectivity
graph and knowledge graph. In 2017 IEEE 15th International Confer- of graphs with given circumference. Theoretical Computer Science,
ence on Software Engineering Research, Management and Applications 772:123–131, 2019.
(SERA), pages 339–344. IEEE, 2017. [137] Kuo Yang, Zhengyang Zhou, Wei Sun, Pengkun Wang, Xu Wang,
[115] Wonseok Shin, Siwoo Song, Kunsoo Park, and Wook-Shin Han. Cardi- and Yang Wang. Extract and refine: Finding a support subgraph set
nality estimation of subgraph matching: A filtering-sampling approach. for graph representation. In Proceedings of the 29th ACM SIGKDD
arXiv preprint arXiv:2309.15433, 2023. Conference on Knowledge Discovery and Data Mining, pages 2953–
[116] Karim Soliman, Marwa El Shenawy, and Ahmed Abou El Farag. Loop 2964, 2023.
unrolling effect on parallel code optimization. In Proceedings of the [138] Linyao Yang, Hongyang Chen, Zhao Li, Xiao Ding, and Xindong Wu.
2nd International Conference on Future Networks and Distributed Give us the facts: Enhancing large language models with knowledge
Systems, pages 1–6, 2018. graphs for fact-aware language modeling. IEEE Transactions on
[117] Sonja Stüdli, Yamin Yan, Maria M Seron, and Richard H Middleton. Knowledge and Data Engineering, 2024.
Plug-and-play networks: Adding vertices and connections to preserve [139] Tetsushi Yuge, Yasumasa Sagawa, and Natsumi Takahashi. Operational
algebraic connectivity. In 2021 60th IEEE Conference on Decision and resilience of network considering common-cause failures. IEICE
Control (CDC), pages 4823–4828. IEEE, 2021. Transactions on Fundamentals of Electronics, Communications and
Computer Sciences, 107(6):855–863, 2024.
[118] Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, and Xiaokui Xiao.
[140] Xin Zheng, Shouzhi Liang, Bo Liu, Xiaoming Xiong, Xianghong
Graphh: High performance big graph analytics in small clusters. In
Hu, and Yuan Liu. Subgraph feature extraction based on multi-view
2017 IEEE International Conference on Cluster Computing (CLUS-
dictionary learning for graph classification. Knowledge-Based Systems,
TER), pages 256–266. IEEE, 2017.
214:106716, 2021.
[119] Shixuan Sun and Qiong Luo. Scaling up subgraph query processing [141] Xin Zheng, Yixin Liu, Zhifeng Bao, Meng Fang, Xia Hu, Alan Wee-
with efficient subgraph matching. In 2019 IEEE 35th International Chung Liew, and Shirui Pan. Towards data-centric graph machine
Conference on Data Engineering (ICDE), pages 220–231. IEEE, 2019. learning: Review and outlook. arXiv preprint arXiv:2309.10979, 2023.
[120] Weiqi Sun, Yuanlong Li, and Liangren Shi. The performance evaluation [142] Chufeng Zhou, Xinxin Guan, Yeli Li, and Qingtao Zeng. Research on
and resilience analysis of supply chain based on logistics network. the discovery of opinion leaders in social networks. In IOP Conference
In 2020 39th Chinese Control Conference (CCC), pages 5772–5777. Series: Materials Science and Engineering, volume 563, page 032009.
IEEE, 2020. IOP Publishing, 2019.
[121] Nguyen Thai-Nghe, Thanh-Nghi Do, and Peter Haddawy. Intelligent [143] Jun Zhou and Masaaki Kondo. An edge-cloud collaboration framework
Systems and Data Science: First International Conference, ISDS 2023, for graph processing in smart society. IEEE Transactions on Emerging
Can Tho, Vietnam, November 11–12, 2023, Proceedings, Part I. Topics in Computing, 2023.
Springer Nature, 2023. [144] Hao-dong ZHU and Bao-feng HE. Improved pagerank algorithm
[122] Tuan Tran. On some studies of fraud detection pipeline and related combined user behavior with topic similarity.
issues from the scope of ensemble learning and graph-based learning.
arXiv preprint arXiv:2205.04626, 2022.
[123] Dmitry Sergeevich Tseluyko and Arina Romanovna Kirpo. Study
of layout structure characteristics using graph-analytical schemes and
tools of spatial syntax theory. Bulletin of Tomsk State University of
Architecture and Building, 25(3):39–53, 2023.

Sydney Anuyah, Emmanuel Bolade and Oluwatosin Agbaakin

You might also like