You're reading from Getting Started with the Graph Query Language (GQL) A complete guide to designing, querying, and managing graph databases with GQL

Product type Paperback

Published in Aug 2025

Publisher Packt

ISBN-13 9781836204015

Length 392 pages

Edition 1st Edition

Languages

Cypher

Concepts

Databases

Authors (3):

Ricky Sun

Jason Zhang

Yuri Simione

View More author details

Table of Contents (17) Chapters

Preface

1. Evolution Towards Graph Databases

2. Key Concepts of GQL FREE CHAPTER

3. Getting Started with GQL

4. GQL Basics

5. Exploring Expressions and Operators

6. Working With GQL Functions

7. Delve into Advanced Clauses

8. Configuring Sessions

9. Graph Transactions

10. Conformance to the GQL Standard

11. Beyond GQL

12. A Case Study – Anti-Fraud

13. The Evolving Landscape of GQL

14. Glossary and Resources

Resources

15. Other Books You May Enjoy

16. Index

Download a Free PDF Copy of This Book

Graphs and graph models

Graphs offer a natural way to model entities and relationships in data, making them an essential tool in various domains. From social networks to recommendation systems, graph-based approaches provide a flexible and efficient means of representing complex structures. Let’s explore the theoretical foundations of graph models and their role in modern data management.

Graph theory and graph data models

Graph database technology is fundamentally rooted in graph theory, which provides both the theoretical and practical foundations for graph computing. In this discussion, we will use the terms graph computing and graph database interchangeably, highlighting that computing plays a more pivotal role than storage in this context. This section explores the evolution of graph theory and its application to graph data modeling.

Graph theory can be traced back nearly 300 years to the groundbreaking work of the Swiss mathematician Leonhard Euler. Widely regarded as one of the greatest mathematicians, Euler laid the groundwork for this discipline with his solution to the Seven Bridges of Königsberg problem. In 1736, he abstracted the city’s physical layout—which included seven bridges and two islands connected to the mainland, forming four distinct land areas—into a graph composed of nodes and edges. His work led to the development of graph theory, which focuses on the study of graphs—structures made up of vertices (nodes) connected by edges (links or relationships).

Figure 1.5: Seven Bridges of Königsberg and graph theory

Euler’s exploration of the Seven Bridges problem involved determining whether it was possible to traverse each bridge exactly once in a single journey. He proved that such a path, known as a Eulerian path, was impossible in this specific configuration. Euler’s criteria for a Eulerian path are still fundamental to graph theory today: a graph can have a Eulerian path if and only if it has exactly zero or two vertices with an odd degree (number of edges connected to a vertex). If all vertices have even degrees, a Eulerian circuit, a special type of path that returns to the starting point, can be found. This early work established essential graph theory concepts that continue to influence modern graph computing.

Graph theory found practical applications beyond Euler’s initial problem. One notable example is the map coloring problem, which arose during the Age of Discovery and the subsequent rise of nation-states. The problem of coloring maps such that no two adjacent regions share the same color was first addressed by mathematicians in the mid-19th century. This problem led to the formulation of the Four-Color Theorem, which states that any map can be colored with no more than four colors such that no two adjacent regions share the same color. The proof of this theorem, completed with the assistance of computer algorithms in 1976, marked a significant milestone in both graph theory and computational methods.

In parallel with these developments, Johann B. Listing’s introduction of topology in 1847, which included concepts such as connectivity and dimensionality, further advanced the field. Sylvester’s work in 1878 formalized the concept of a graph as a collection of vertices connected by edges, introducing terminology that remains central to graph theory.

The systematic study of random graphs by mathematicians Erdős and Rényi in the 1960s laid the groundwork for understanding complex networks. Random graph theory became a fundamental tool for analyzing various types of networks, from social interactions to biological systems.

The advent of the semantic web in the early 1990s, proposed by Tim Berners-Lee, marked a significant application of graph theory to the World Wide Web. The semantic web conceptualizes web resources as nodes in a vast, interconnected graph, promoting the development of standards like the Resource Description Framework (RDF). While RDF did not achieve widespread industry adoption, it paved the way for the growth of knowledge graphs and social graphs, which became integral to major tech companies such as Yahoo!, Google, and Facebook.

Graph databases are now considered a subset of NoSQL databases, providing a contrast to traditional SQL-based relational databases. While SQL databases use tabular structures to model data, graph databases leverage high-dimensional graphs to represent complex relationships more naturally. Graph databases utilize vertices and edges to encode relationships, offering a more intuitive and efficient means of handling interconnected data. This approach contrasts with the tabular, two-dimensional constraints of traditional relational databases, which often struggle with complex, high-dimensional problems.

Graph theory has various applications, including navigation, recommendation engines, and resource scheduling. Despite the theoretical alignment with graph computing, many existing solutions use relational or columnar databases to tackle graph problems. This results in inefficient solutions that fail to leverage the full potential of graph-based methodologies. As knowledge graphs gain traction, the significance of graph databases and computing continues to grow, addressing challenges that traditional databases are ill-equipped to handle.

The evolution of graph theory and its integration into graph computing reflects a broader shift toward leveraging complex, interconnected data structures. Graph databases offer promising solutions to the limitations faced by previous data management approaches, making them a crucial component of modern data infrastructure.

Property graphs and semantic knowledge graphs

The evolution of technology often follows a trajectory marked by phases of innovation, adoption, peak excitement, disillusionment, and eventual maturity. This pattern is evident in the realm of graph database (or graph computing) development, where two primary types of graph models — Property Graphs (PGs) and Semantic Knowledge Graphs (SKGs) — have emerged, each contributing to the field in distinct ways.

Property Graphs

Property graphs, also known as Labeled Property Graphs (LPGs), represent one of the most influential models in graph computing. The concept of PGs revolves around nodes, edges, and properties. Nodes, also referred to as vertices or entities, and edges, the connections or relationships between nodes, can have associated attributes or properties. These attributes might include identifiers, weights, timestamps, and other metadata that provide additional context to the relationships and entities within the graph.

LPG is a term popularized by Neo4j, a graph database, where a label is considered a special kind of index that can be associated with either nodes or edges for accelerated data access. Many people use LPG and PG interchangeably.

However, LPGs are actually a subset of property graph databases, as there are multiple ways to implement a graph database’s data model. For instance, Neo4j’s LPG implementation is schema-free, while GQL’s PG design is schematic.

It’s easy to see that, without properties (attribute fields), the expressive power of graphs would be significantly diminished. However, there is a reason for this. In the 1980s and 1990s, social behavior analytics gained traction, eventually leading to the uprising of Social Network Services (SNSs). Traditionally, data analysis in SNSs focused primarily on the skeleton (or topology) of the data, and properties were not a priority. This has been the case for most of the graph-processing frameworks predating almost all PG databases.

The property graph model has seen a proliferation of implementations, including DGraph, TigerGraph, Memgraph, and Ultipa. These systems differ in architectural choices, service models, and APIs, reflecting the diverse needs and rapid evolution of the graph database market. The dynamic landscape of PG databases illustrates the flexibility and adaptability of this model in addressing a wide range of use cases.

Semantic Knowledge Graphs (SKGs)

In contrast to property graphs, SKGs are built upon principles derived from the Resource Description Framework (RDF) and related standards. SKGs focus on representing knowledge through semantic relationships, enabling more sophisticated querying and reasoning about the data.

The RDF standard, developed by the World Wide Web Consortium in 2004 (v1.0) and updated in 2014 (v1.1), provides a structured framework for describing metadata and relationships in a machine-readable format. RDF’s primary query language, SPARQL, allows for querying complex data structures, but it is often criticized for its verbosity and complexity. RDF’s emphasis on semantic relationships aligns with the goal of creating interoperable and extensible knowledge representations.

Despite its strong academic foundation, RDF and its associated technologies have faced challenges in gaining widespread adoption in the industry. The complexity of RDF and SPARQL has led to a preference for more user-friendly alternatives, such as JSON and simpler query languages. Exactly for this reason, property graph databases and GQL were born and are used by many graph enthusiasts and innovative enterprises who are looking to digitally transform their businesses.

While property graphs excel in data traversals with practical applications and ease of use, SKGs focus on the Natural Language Processing (NLP) aspect of things, offering a richer framework for semantic reasoning and interoperability. The interplay between these two models of graph stores, often in the form of using a PG database for link analysis (path-finding or deep traversals), and using an RDF store for semantic processing, reflects a broader trend toward integrating the strengths of both approaches to address diverse data challenges.

Current and future trends in graph database technology

As graph database technology continues to evolve, several key trends and advancements are shaping its development. These trends reflect the growing complexity of data environments and the increasing demand for powerful, efficient solutions.

This section explores three significant trends: Hybrid Transactional and Analytical Processing (HTAP), handling sea-volume large-scale data while maintaining performance, and compliance with the emerging GQL standard.

Hybrid Transactional and Analytical Processing (HTAP)

HTAP represents a transformative approach in the graph database arena. Traditionally, databases were categorized into transactional systems such as Online Transaction Processing (OLTP) and analytical systems such as Online Analytical Processing (OLAP), each optimized for different workloads. Transactional systems focus on managing and recording day-to-day operations, while analytical systems are designed for complex queries and large-scale data analysis.

Many, if not most, graph databases and almost all graph-processing frameworks were originally designed to handle AP-centric traffic. This is also true for most NoSQL and big-data frameworks. These AP-centric graph systems tend to ingest volume data in offline mode and process the static data in online mode, meaning they are slow to ingest data in online mode. If graph database systems are to become the next mainstream database, the most critical requirement is the HTAP capabilities.

HTAP bridges this divide by enabling a single system, usually in the form of a cluster of multiple instances, to handle both transactional and analytical workloads. This integration is crucial for modern applications that require real-time analytics on live or transactional data.

In the context of graph databases, HTAP offers several advantages:

Performance Optimization: Advances in HTAP technology include innovations in indexing, query optimization, and in-memory processing. These improvements help maintain high performance levels even as data volumes and query complexities increase.
Real-Time Insights: HTAP enables real-time analytics on graph data, allowing organizations to gain immediate insights from ongoing transactions. This capability is particularly valuable in scenarios such as online fraud detection, recommendation engines, operation support and decision-making, and dynamic network analysis.
Streamlined Architecture: By consolidating transactional and analytical processing into a single logical system, HTAP reduces the complexity of maintaining separate databases for different purposes. This integration simplifies architecture and improves data consistency across various use cases.

Recent developments in HTAP for graph databases include the adoption of in-memory processing with large-scale parallelization and distributed computing. In-memory processing allows for faster data access and query execution, while distributed computing techniques enable the scaling of HTAP systems to handle large and complex graph data.

There are different approaches to in-memory computing, primarily distinguished by their ability to update datasets in real time. One school of design may simply project data into memory space while the data stays unchanged; while another school may support real-time synchronization of in-memory data with the persistent layer, which requires more sophisticated design and engineering skills.

Handling large-scale graph data

As data volumes grow exponentially, handling sea-volume large-scale data without significant performance degradation becomes a critical challenge. Traditional graph databases often struggled with performance issues, particularly when executing deep traversal queries that required extensive computation.

Modern graph databases address these challenges through several key strategies:

Distributed architecture
Graph partitioning (sharding)
Hardware-aided storage and computing
Graph query/algorithm optimization

Let’s explore them in detail.

Distributed architecture

Distributed graph databases leverage clusters of machines to distribute data and computational workloads. This architecture supports both vertical and horizontal scaling, enabling the system to handle vast amounts of data by adding higher-end hardware components or more nodes to the cluster.

Systems typically evolve from a standalone instance to master-slave or high-availability architecture, then to a distributed-consensus architecture, and eventually to horizontally scalable architecture.

For readers who are interested in scalable graph database design, it is recommended to read books such as The Essential Criteria of Graph Databases by Ricky Sun, published in 2024, with dedicated chapters to scalable graph database design.

Graph partitioning (sharding)

Graph partitioning techniques divide a large graph into smaller, more manageable subgraphs (shards). These partitions are contained on individual server instance nodes to be processed independently, reducing the computational load on each node. Efficient partitioning strategies minimize inter-node communication and improve overall performance.

The commonly used partitioning/sharding techniques are to cut by vertex or by edge. Note that both techniques would involve extra architectural components to be added (i.e., meta-server, name-server, shard-server, etc.) and data duplication (i.e., 2x or 3x more data points to be stored to ensure the data linkages are unbroken).

Hardware-aided storage and computing

Performance bottlenecks can be mitigated through hardware-accelerated storage and computing. In-memory databases reduce latency by storing data in RAM, SSDs offer faster data access compared to traditional hard drives, and the GPU and FPGA will help offloading the CPUs. These storage and compute solutions are increasingly integrated into graph databases to enhance performance and scalability.

Optimized graph queries and algorithms

Advancements at the hardware and software levels would require the matching graph queries and algorithms to be re-invented. Many graph algorithms were originally designed to be run in sequential mode (single-thread implementation), and have to be re-engineered to be able to harness data to vastly improve parallel computing power with modern CPUs and distributed environments, and the same holds true for many graph queries, such as path-finding, k-hoping, or just online data ingestion, which can be greatly improved with large-scale and distributed data processing. Queries and algorithms that minimize redundant computations and optimize data access patterns can significantly improve performance during deep traversals.

GQL compliance

GQL is emerging as a major standard in graph database technology, providing a unified query language for graph data. With the first edition of GQL already published, compliance with this standard is becoming a key focus for graph database vendors, as well as traditional RDBMS and NoSQL providers. Compliance helps vendors retain existing customers and attract new ones by ensuring interoperability and standardization across graph data platforms.

Key aspects of GQL compliance include the following:

Standardized Query Syntax: GQL offers a standardized syntax for querying graph data, making it easier for developers to write and maintain queries across different graph database systems. This standardization promotes interoperability and reduces the learning curve associated with adopting new graph databases.
Advanced Query Capabilities: GQL supports advanced querying capabilities, including pattern matching, traversal, and aggregation. By defining a comprehensive set of features, GQL will enable more sophisticated queries and analyses, enhancing the flexibility and power of graph databases.
Interoperability: Compliance with GQL improves integration and interoperability between different graph databases and applications. This is particularly important for organizations that use multiple graph technologies or require data exchange between systems.
Industry Adoption: As GQL gains traction, industry adoption is likely to drive further innovation and refinement. Vendors that prioritize GQL compliance will position themselves as leaders in the graph database market, attracting customers seeking standardized and future-proof solutions.

The trends in graph database technology highlight a dynamic and rapidly evolving field. HTAP is revolutionizing how graph databases handle transactional and analytical workloads, enabling real-time insights and streamlined architectures. Addressing sea-volume large-scale data handling challenges through distributed architectures, graph partitioning, and optimized algorithms ensures that graph databases can scale efficiently. Finally, GQL compliance is set to standardize and enhance graph querying, fostering greater interoperability and innovation in the industry. As these trends continue to develop, they will shape the future of (graph) database technology, driving advancements and new applications.