Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Graph Databases in Action: Examples in Gremlin
Graph Databases in Action: Examples in Gremlin
Graph Databases in Action: Examples in Gremlin
Ebook715 pages6 hours

Graph Databases in Action: Examples in Gremlin

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Graph Databases in Action introduces you to graph database concepts by comparing them with relational database constructs. You'll learn just enough theory to get started, then progress to hands-on development. Discover use cases involving social networking, recommendation engines, and personalization.

Summary
Relationships in data often look far more like a web than an orderly set of rows and columns. Graph databases shine when it comes to revealing valuable insights within complex, interconnected data such as demographics, financial records, or computer networks. In Graph Databases in Action, experts Dave Bechberger and Josh Perryman illuminate the design and implementation of graph databases in real-world applications. You'll learn how to choose the right database solutions for your tasks, and how to use your new knowledge to build agile, flexible, and high-performing graph-powered applications!

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Isolated data is a thing of the past! Now, data is connected, and graph databases—like Amazon Neptune, Microsoft Cosmos DB, and Neo4j—are the essential tools of this new reality. Graph databases represent relationships naturally, speeding the discovery of insights and driving business value.

About the book
Graph Databases in Action introduces you to graph database concepts by comparing them with relational database constructs. You'll learn just enough theory to get started, then progress to hands-on development. Discover use cases involving social networking, recommendation engines, and personalization.

What's inside
    Graph databases vs. relational databases
    Systematic graph data modeling
    Querying and navigating a graph
    Graph patterns
    Pitfalls and antipatterns

About the reader
For software developers. No experience with graph databases required.

About the author
Dave Bechberger and Josh Perryman have decades of experience building complex data-driven systems and have worked with graph databases since 2014.

Table of Contents

PART 1 - GETTING STARTED WITH GRAPH DATABASES

1 Introduction to graphs

2 Graph data modeling

3 Running basic and recursive traversals

4 Pathfinding traversals and mutating graphs

5 Formatting results

6 Developing an application

PART 2 - BUILDING ON GRAPH DATABASES

7 Advanced data modeling techniques

8 Building traversals using known walks

9 Working with subgraphs

PART 3 - MOVING BEYOND THE BASICS

10 Performance, pitfalls, and anti-patterns

11 What's next: Graph analytics, machine learning, and resources
LanguageEnglish
PublisherManning
Release dateOct 17, 2020
ISBN9781638350101
Graph Databases in Action: Examples in Gremlin
Author

Josh Perryman

Josh Perryman is technologist with over two decades of diverse experience building and maintaining complex systems, including high performance computing (HPC) environments. Since 2014 he has focused on graph databases, especially in distributed or big data environments, and he regularly blogs and speaks at conferences about graph databases.

Related to Graph Databases in Action

Related ebooks

Data Modeling & Design For You

View More

Related articles

Reviews for Graph Databases in Action

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Graph Databases in Action - Josh Perryman

    Graph Databases in Action

    Examples in Gremlin

    Dave Bechberger and Josh Perryman

    Foreword by Ted Wilmes

    To comment go to liveBook

    Manning

    Shelter Island

    For more information on this and other Manning titles go to

    manning.com

    Copyright

    For online information and ordering of these  and other Manning books, please visit manning.com. The publisher offers discounts on these books when ordered in quantity.

    For more information, please contact

    Special Sales Department

    Manning Publications Co.

    20 Baldwin Road

    PO Box 761

    Shelter Island, NY 11964

    Email: [email protected]

    ©2020 by Manning Publications Co. All rights reserved.

    No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

    Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

    ♾ Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.

    ISBN: 9781617296376

    contents

    foreword

    preface

    acknowledgments

    about this book

    about the authors

    about the cover illustration

    Part 1. Getting started with graph databases

      1 Introduction to graphs

    1.1  What is a graph?

    What is a graph database?

    Comparison with other types of databases

    Why can’t I use SQL?

    1.2  Is my problem a graph problem?

    Explore the questions

    I’m still confused. . . . Is this a graph problem?

      2 Graph data modeling

    2.1  The data modeling process

    Data modeling terms

    Four-step process for data modeling

    2.2  Understand the problem

    Domain and scope questions

    Business entity questions

    Functionality questions

    2.3  Developing the whiteboard model

    Identifying and grouping entities

    Identifying relationships between entities

    2.4  Constructing the logical data model

    Translating entities to vertices

    Translating relationships to edges

    Finding and assigning properties

    2.5  Checking our model

      3 Running basic and recursive traversals

    3.1  Setting up your environment

    Starting the Gremlin Server

    Starting the Gremlin Console, connecting to the Gremlin Server, and loading the data

    3.2  Traversing a graph

    Using a logical data model (schema) to plan traversals

    Planning the steps through the graph data

    Fundamental concepts of traversing a graph

    Writing traversals in Gremlin

    Retrieving properties with values steps

    3.3  Recursive traversals

    Using recursive logic

    Writing recursive traversals in Gremlin

      4 Pathfinding traversals and mutating graphs

    4.1  Mutating a graph

    Creating vertices and edges

    Removing data from our graph

    Updating a graph

    Extending our graph

    4.2  Paths

    Cycles in graphs

    Finding the simple path

    4.3  Traversing and filtering edges

    Introducing the E and V steps for traversing edges

    Filtering with edge properties

    Include edges in path results

    Performant edge counts and denormalization

      5 Formatting results

    5.1  Review of values steps

    5.2  Constructing our result payload

    Applying aliases in Gremlin

    Projecting results instead of aliasing

    5.3  Organizing our results

    Ordering results returned from a graph traversal

    Grouping results returned from a graph traversal

    Limiting results

    5.4  Combining steps into complex traversals

      6 Developing an application

    6.1  Starting the project

    Selecting our tools

    Setting up the project

    Obtaining a driver

    Preparing the database server Instance

    6.2  Connecting to our database

    Building the cluster configuration

    Setting up the GraphTraversalSource

    6.3  Retrieving data

    Retrieving a vertex

    Using Gremlin language variants (GLVs)

    Adding terminal steps

    Creating the Java method in our application

    6.4  Adding, modifying, and deleting data

    Adding vertices

    Adding edges

    Updating properties

    Deleting elements

    6.5  Translating our list and path traversals

    Getting a list of results

    Implementing recursive traversals

    Implementing paths

    Part 2. Building on Graph Databases

      7 Advanced data modeling techniques

    7.1  Reviewing our current data models

    7.2  Extending our logical data model

    7.3  Translating entities to vertices

    Using generic labels

    Denormalizing graph data

    Translating relationships to edges

    Finding and assigning properties

    Moving properties to edges

    Checking our model

    7.4  Extending our data model for personalization

    7.5  Comparing the results

      8 Building traversals using known walks

    8.1  Preparing to develop our traversals

    Identifying the required elements

    Selecting a starting place

    Setting up test data

    8.2  Writing our first traversal

    Designing our traversal

    Developing the traversal code

    8.3  Pagination and graph databases

    8.4  Recommending the highest-rated restaurants

    Designing our traversal

    Developing the traversal code

    8.5  Writing the last recommendation engine traversal

    Designing our traversal

    Adding this traversal to our application

      9 Working with subgraphs

    9.1  Working with subgraphs

    Extracting a subgraph

    Traversing a subgraph

    9.2  Building a subgraph for personalization

    9.3  Building the traversal

    Reversing the traversing direction

    Evaluating the individualized results of the subgraph

    9.4  Implementing a subgraph with a remote connection

    Connecting with TinkerPop’s Client class

    Adding this traversal to our application

    Part 3. Moving Beyond the Basics

    10 Performance, pitfalls, and anti-patterns

    10.1  Slow-performing traversals

    Explaining our traversal

    Profiling our traversal

    Indexes

    10.2  Dealing with supernodes

    It’s about instance data

    It’s about the database

    What makes a supernode?

    Monitoring for supernodes

    What to do if you have a supernode

    10.3  Application anti-patterns

    Using graphs for non-graph use cases

    Dirty data

    Lack of adequate testing

    10.4  Traversal anti-patterns

    Not using parameterized traversals

    Using unlabeled filtering steps

    11 What’s next: Graph analytics, machine learning, and resources

    11.1  Graph analytics

    Pathfinding

    Centrality

    Community detection

    Graphs and machine learning

    Additional resources

    11.2  Final thoughts

    appendix. Apache TinkerPop installation and overview

    index

    front matter

    foreword

    At the dawn of a new decade, developers are confronted with a myriad of database options when beginning a new project. The stalwart relational database still rules the roost, maintaining popularity in both legacy and greenfield projects. This is for good reason; flexibility and forty plus years of cumulative engineering history are hard to argue with. Despite the success of relational databases, the last decade saw an explosion of new commercial and open-source database systems that were designed around alternative models and query languages. Some tackle traditional RDBMS workloads with a new twist, perhaps focusing horizontal scale out or high performance via the embrace of in-memory optimization that have become available due to decreases in RAM prices. Many other systems diverged from the relational model altogether. Out of this set, we find a variety of focus areas and modeling paradigms. This book focuses on one of the more expressive and powerful developments, the graph model, and the property graph in particular.

    Graph databases aren’t a new thing. Hierarchical and navigational databases have existed since the 60s, but these have recently experienced an increase in developer popularity. I think this is largely due to the intuitiveness of the property graph data model. People are already wired to think in graphs. If you draw a graph on a whiteboard, technical and non-technical folks get it. Consequently, after you overlay the graph model onto your software tasks at hand, everything starts to look like a graph problem.

    With all that said, we’re still dealing with technology, and the available property graph databases are the newer technology at that, so there isn’t any magic. This is where Dave and Josh come in. I can’t imagine a better pair to help lay out the signposts and guide you on the journey to graph understanding. Both are accomplished graph architects and developers that have been involved in this junior space since before its recent uptick in popularity. Having worked in graph-based product development and consulting, they’ve racked up years of real-world experience.

    This experience has influenced their pragmatic approach to the problems of graph application development, and though both proponents of graphs, they’re proponents with a healthy dose of skepticism and are not overly fascinated with the technology. After all, as mentioned, one of the first and most important questions new developers have is, Is this a graph problem? As you make your way through this book, you’ll hone an intuition for translating real world problems into graph data models and build up your Gremlin query chops, a popular and powerful property graph query language. The rubber meets the road in chapter 6 where you use this knowledge to build your first graph application. By the time you’ve finished, you’ll have the knowledge to evaluate if a graph database is a good fit for your next project, and if so, to execute on that vision having already built an example graph database application.

    Ted Wilmes

    Data Architect & JanusGraph Technical Steering Committee Member

    Expero Inc.

    preface

    Two complementary trends started in the mid to late 2000s. First, companies began using and collecting more data on their customers, competition, and users than ever before. Second, the information companies wanted from this data became more complex, often containing hidden connections. These two trends drove the need for an easier exploration of expansive, yet highly connected data. Graph databases met that need.

    Both the authors have gotten an up-close and personal view of this market as the technology, usage, and adoption of graph technology has matured. We both started using graph databases in the mid 2010s while working for a niche software consulting company. Independently, we each worked on projects that used graph databases to solve specific types of complex data problems. At that time, graph databases were new and very rough. Despite the challenges of working with new technologies, we both recognized the power of this tool and were hooked.

    Since then, we have spent countless hours banging our heads against a proverbial wall to understand all the intricacies and nuances of building graph-backed applications. This book is the distillation of those countless hours of struggle. It is our hope that the hands-on nature of this book will provide a solid, foundational understanding of the skills needed to build graph-backed applications and, in the process, help you to avoid some of the pitfalls that we encountered.

    acknowledgments

    This book has been a labor of love, and sometimes frustration, so we first and foremost need to thank our wives (Melody and Meredith), and then acknowledge family and friends for their endless patience and for indulging us as we shared our latest esoteric discoveries while working with graph databases. Without their support we never could have made it through the countless hours it took to create this book.

    A big thank you goes out to Dr. Denise Gosnell, Kelly Mondor, Ted Wilmes, and Daniel Farrell for all the specific insights, interviews, and support you provided, which helped us immensely in creating this book.

    We would also like to thank the team at Manning Publications for allowing us the time and opportunity to publish this book. We would like to thank the entire Manning staff and specifically our publishers Marjan Bace and Michael Stephens, as well as our editors Frances Lefkowitz, Nick Watts, Alex Ott, Lori Weidert, and Frances Buran for all the amazing feedback and endless patience you have shown. Our appreciation also goes out to all the reviewers whose comments and reviews were invaluable in solidifying the organization and in clarifying the focus of this book: Scott Bartram, Andrew Blair, Alain Couniot, Douglas Duncan, Mike Erickson, John Guthrie, Mike Haller, Milorad Imbra, Ramaninder Singh Jhajj, Mike Jensen, Nicholas Robert Keers, Mladen Knežic´, Miguel Montalvo, Luis Moux, Nick Rakochy, Ron Sher, Deshuang Tang, Richard Vaughan, and Matthew Welke.

    We would also like to thank the team at Expero Inc., without whom Josh and Dave would never have met, nor would have ever started their exploration of graph databases. Our many years of working side by side with the exceptionally talented Experonauts were a fruitful starting point that eventually led to writing this book.

    about this book

    This book is written for anyone building applications using graph databases. It is designed to provide a foundational understanding of graphs and graph databases, as well as to provide a framework for building applications using common graph database patterns. To teach this framework, this book follows the development lifecycle of a fictitious application called DiningByFriends. We use this application throughout the book to provide a realistic grounding of graph principles and examples of the concepts and content we teach. In many areas throughout this book, we compare and contrast the differences between building a graph-backed application and using the more traditional relational database model. By the end of this book, you will not only have the skills needed to build your own graph-backed application, but you will have built your first application, DiningByFriends.

    Who should read this book

    This book is for application developers, data engineers, and database developers who want to use graph databases as the backing data store for their applications. Throughout this book, we do not expect the reader to have any prior experience using graph databases, but you should be familiar with data modeling concepts, specifically with relational database development, as these are used heavily throughout as a common point of reference. Although all the application code is written in Java, any developer with object-oriented application development experience should be able to follow along with the concepts and content.

    How this book is organized: A roadmap

    This book is organized into 3 parts, comprising of 11 chapters. In part 1, Getting started with graph databases, we establish the foundation for our DiningByFriends application:

    Chapter 1 begins with an introduction to graphs and graph terminology. We discuss how graph databases differ from relational databases and how you can use graph databases to solve highly connected data problems. We finish this chapter by discussing what makes a problem a good candidate for using a graph database.

    Chapter 2 is where we hit the ground running by building an initial data model for our DiningByFriends application. We start with the types of information needed to begin the data modeling process. We then show how to turn this information into a conceptual data model. Finally, we walk through a framework for taking our business needs and our conceptual data model and turn that into our initial data model using the elements of a graph database: vertices, edges, and properties.

    Chapter 3 begins a set of three chapters focused on learning the process of querying a graph database, known as traversing. We begin by teaching you how to retrieve and filter data from our graph. We follow this with learning how to navigate the structure of our graph and how that differs from working with a relational database. Then we finish up this chapter by demonstrating the ease with which you can recursively traverse through a graph to retrieve complex, interconnected data.

    Chapter 4 continues our exploration of graph traversals with data mutation use cases. We then show how you can traverse the graph to find the entities and relationships that connect two items, known as the path. Finally, we look at how to leverage properties on relationships to filter the traversals and increase their performance.

    Chapter 5 finishes our initial focus on graph traversals with a discussion of ways to format the results of our traversal into a desired output. Additionally, you learn how to perform common operations such as sorting, filtering, and limiting the results returned.

    Chapter 6 begins the process of building our DiningByFriends application by taking the traversals we developed in chapters 3, 4, and 5 and walking through incorporating these into a Java application. Then we’ll process the results to complete this first part.

    In part 2, Building an application with graph databases, we extend the concepts introduced in part 1:

    Chapter 7 uses the foundations of data modeling from chapter 2, as well as what you learned about traversing a graph, to extend the data model for more complex use cases, such as recommendation engines and personalization.

    Chapter 8 leverages a recommendation engine use case to demonstrate the power of using a known-walk pattern to create a robust recommendation application pattern.

    Chapter 9 uses our personalization use case to demonstrate how to use a subgraph access pattern within a graph-backed application.

    In part 3, Beyond the basics, we move past the DiningByFriends application to discuss our next steps in the application development process.

    Chapter 10 discusses how to debug and troubleshoot common performance problems with traversals. We then investigate exactly what supernodes are and why they cause issues in graph-backed applications. We follow up these common performance problems with common application and traversal pitfalls and anti-patterns, as well as how to recognize and avoid them.

    Chapter 11 takes a forward-looking view and discusses some of the next steps you might want to take with your graph-backed application. We also discuss some of the most common graph analytics algorithms and how you can apply these to solve a specific problem. Finally, we wrap up this chapter with a brief overview of how to leverage graphs in machine learning (ML) application.

    About the code

    This book contains many examples of source code, both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text.

    In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page size in the book. In rare cases, even this was not enough and code listings include line-continuation markers (➥). Additionally, code annotations accompany many of the listings, highlighting important concepts.

    The code for the examples in this book is available for download from the Manning website at https://www.manning.com/books/graph-databases-in-action, and from GitHub at https://github.com/bechbd/graph-databases-in-action.

    About the technologies

    Our goal throughout this book is to equip the reader with the conceptual knowledge needed to build graph-backed applications. However, in order to provide practical examples of these concepts, we had to make decisions regarding the technologies used for demonstration.

    Our first decision was to pick the type of database. We decided to use a labeled property graph database, instead of, for example, an RDF store or triplestore database. Labeled property graph databases are the most common type we have seen in production use and seem to be the ones with the most momentum behind them. Additionally, these are the closest to the familiar concepts of relational databases, so labeled property graph databases are quite effective for comparisons.

    This lead us to our next decision: the traversal language to use, openCypher or Gremlin.

    While there’s a strong case for using openCypher, the goal of this book is to remain as vendor-agnostic as possible. It is important to us that these concepts and techniques are easily transferable to many popular databases when you start to build your applications. In the end, we decided to use the Apache TinkerPop version 3.4.x framework because it currently has the most database vendors with compatible implementations.

    We have been questioned multiple times during the proposal and review processes as to why we chose this stack over a Neo4j/Cypher stack. Given the popularity of the Neo4j ecosystem this is a fair question which deserves fuller comment. There are three reasons we chose TinkerPop’s Gremlin for the illustrations throughout this book:

    Gremlin is a better tool for teaching how a traversal works.

    Gremlin is a common language of choice for enterprise applications.

    Gremlin is the most portable language between property graph databases.

    As for the first reason, we believe that the imperative design of Gremlin provides a better teaching tool for learning how a graph traversal works compared to the declarative approach of Cypher/openCypher. The syntax of Gremlin requires that we think about how we are moving through our graph in order to determine where we will move next. While we do appreciate the simplicity of Cypher/openCypher, it can also obfuscate critical technical matters, especially when dealing with issues of performance or scale. So while Cypher/openCypher is a great starting point for learning how to work with connected data, we feel that Gremlin is better suited for building high performing, scalable data applications.

    Because Gremlin is the common language of choice for enterprise applications, many of these applications were built using TinkerPop-enabled databases. This means that Gremlin is the query language of choice. Some organizations have both Cypher/openCypher and Gremlin applications. But in our experience, the bigger, more complex enterprise-level projects seem to have chosen one of the many TinkerPop-enabled databases or cloud services.

    As for our third choice, at this time, it is easy to say that Gremlin is the most widely available query language across graph database engines. Nearly all of the major cloud vendors (Amazon Web Services, Microsoft Azure, IBM, Huawei, and so forth) offer graph databases or services compatible with Gremlin. The lone exception is the Google Cloud Platform, which offers Neo4j as a service.

    Our goal is not to advocate for one database or language over another. We seek to provide you with a solid foundation for how to use a graph database when building applications with highly connected data and to illustrate how graph databases work under the cover. We think that Gremlin provides the best path to accomplish this.

    With the decision to use TinkerPop’s Gremlin made, we had to pick a specific TinkerPop-enabled database to use. In the spirit of remaining vendor agnostic, we’ve decided to use TinkerGraph for the examples. TinkerGraph is the graph implementation used in the Gremlin Server and Gremlin Console, the reference software provided as part of the Apache Software Foundation’s TinkerPop project.

    Finally, we had to decide on an application programming language to build our example application, DiningByFriends. As Java is the most common language we have used with graph databases, we chose that as our application language. We should note that it is possible to build the same application with other languages such as C#, JavaScript and Python. Not only is it possible, we have done so ourselves. But all the traversals provided in this book are written in Gremlin and any application code is written in Java.

    While almost all the concepts presented throughout this book are not specific to TinkerPop-enabled databases, there are a few we discuss that are unique to TinkerPop. When this is the case, we'll note where a TinkerPop-specific feature is used so that you’re aware that a particular feature might not be available in your graph database of choice. If no such note is given, it is safe to assume that the concept we discuss is applicable to other labeled property graph databases as well.

    liveBook discussion forum

    Purchase of Graph Databases in Action includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the authors and from other users. To access the forum, go to https://livebook.manning.com/#!/book/graph-databases-in-action/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion.

    Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the authors can take place. It is not a commitment to any specific amount of participation on the part of the authors, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the authors some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

    about the authors

    Dave Bechberger is a data architect and developer with over two decades of experience. He uses his extensive knowledge of graph and other big data technologies to build highly performant and scalable data platforms in complex data domains such as bioinformatics, oil and gas, and supply chain management. Since the mid-2010s, Dave has worked with graph databases as a consultant, consumer, and vendor. He is an active member of the graph community and has presented on a wide range of graph-related topics at national and international conferences.

    Josh Perryman also has over two decades of experience building and maintaining complex systems. Since 2014, he has focused on graph databases, especially in distributed or big data environments, and he regularly blogs and speaks at conferences about graph databases. Josh has worked with a variety of industries, including enterprise software, financial services, consumer products, and government intelligence agencies. In addition to consulting and product work, he has designed Gremlin training courses that have been delivered all over the world.

    about the cover illustration

    The figure on the cover of Graph Databases in Action is captioned Femme de la Foret Noire, or a woman from the Black Forest, in Southwest Germany. The illustration is taken from a collection of dress costumes from various countries by Jacques Grasset de Saint-Sauveur (1757-1810), titled Costumes civils actuels de tous les peoples connus, published in France in 1788. Each illustration is finely drawn and colored by hand. The rich variety of Grasset de Saint-Sauveur’s collection reminds us vividly of how culturally apart the world’s towns and regions were just 200 years ago. Isolated from each other, people spoke different dialects and languages. In the streets or in the countryside, it was easy to identify where they lived and what their trade or station in life was just by their dress.

    The way we dress has changed since then and the diversity by region, so rich at the time, has faded away. It is now hard to tell apart the inhabitants of different continents, let alone different towns, regions, or countries. Perhaps we have traded cultural diversity for a more varied personal life--certainly for a more varied and fast-paced technological life.

    At a time when it is hard to tell one computer book from another, Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional life of two centuries ago, brought back to life by Grasset de Saint-Sauveur’s pictures.

    Part 1. Getting started with graph databases

    Journeys into new technologies take work, and in this book, our journey will extend your current knowledge of building relational database applications to demonstrate how you can solve complex data problems by building graph databases and graph-backed applications. In this first part, we ease into your journey by establishing concepts, terms, and processes, while highlighting the critical differences required when approaching a problem with a graph mindset.

    Chapter 1 introduces the core concepts of graphs and discusses the types of problems that are well suited for these models. In chapter 2, we establish a data modeling methodology and build a simple data model for a social network that we’ll use in our example application, DiningByFriends. The next three chapters introduce the most common operations that you’ll use to find and manipulate data in graph databases. We approach these operations in three stages, starting with the basics of moving around a graph in chapter 3. Chapter 4 then covers how to perform basic CRUD (Create/Read/Update/Delete) operations before extending the work we did in chapter 3 to perform more complex recursive and pathfinding traversals. In chapter 5, we close our introduction by using simple graph operations to examine ways to organize your results. Chapter 6 completes this part by synthesizing the work from chapters 2 through 5 into our working Java application, DiningByFriends.

    1 Introduction to graphs

    This chapter covers

    An introduction to graphs and graph terminology

    How graph databases help solve highly connected data problems

    The advantages of graph databases over relational databases

    Identifying problems that make good candidates for using a graph database

    Modern applications are built on data--data that is ever increasing in both size and complexity. Even as the complexity of our data grows, so do our expectations of what insight our applications can derive from that data. If you are old enough, you likely remember when applications took a long time to load data and had limited features. Today’s reality is different; applications provide powerful, flexible, and immediate insight into data. But for every 100 questions modern applications answer, the most common data tool these use (namely, a relational database) handles only about 88 of those questions well. That leaves 12 types of questions where relational databases struggle. These remaining questions deal with the links and connections within the data, those aspects of the data that can generate powerful and unique insights. This puts us at a crossroad: we can use the relational database hammer to pound away at those questions and make this work well enough, or we can take a step back and look at what other tools can answer these questions better, faster, and with less effort.

    By reading this book, you decided to take a step back from your relational database hammer and investigate a road less traveled: graph databases. This book is written for developers, engineers, and architects who are interested in other ways to solve problems specific to working with highly connected data. We assume you are already familiar with relational databases but are interested in learning when, where, and how graph databases are a better tool.

    Our goal with this book is to equip you with the techniques needed to add graph databases as another tool in your toolbelt. We like to think of this book as the guide that we wish we had when we started building graph-backed applications. Throughout this book, we’ll demonstrate common graph patterns that highlight how graph databases enable navigation and exploration of data in ways not easily accomplished with a traditional relational database.

    Our primary approach is through an example of building a fictitious restaurant review and recommendation application we call DiningByFriends. As we move through the software development life cycle from planning, to analysis, to design, and on to implementation, this application demonstrates how to think about and work with graph data. Each chapter builds on the previous chapter, and by the end of this book, we’ll have created a functioning application on a graph database. We believe that putting the concepts immediately to work by solving a realistic set of problems, even if they are somewhat simplistic, is the best way to get comfortable using a new technology. Let’s begin our journey with an introduction to what graphs and graph databases are and how they compare with traditional tools such as relational databases.

    1.1 What is a graph?

    When you look at a road map, examine an organizational chart, or use social networks such as Facebook, LinkedIn, or Twitter, you use a graph. Graphs are a nearly ubiquitous way to think about real-world scenarios as these abstract out the items and the relationships being represented, and this abstraction allows for quick and efficient processing of the connections within the data.

    Let’s demonstrate with a common task: going to the supermarket. Take out a piece of paper and draw out a plan for getting from your house to your supermarket. Chances are it looks something like figure 1.1.

    Figure 1.1 A graph representing directions to the supermarket

    Figure 1.1 shows a graph where the key items and relationships are represented by abstractions. First, we abstracted key locations, like intersections, and represented these as circles. We then designated the connections between these key intersections as lines, showing how the key intersections are related. This is just one example of how we naturally represent real-world problems as graphs.

    It is human nature to abstract real-world entities and their relationships, and the mathematical name for this abstract construct is a graph. When thinking about a set of data that contains a vast array of highly interconnected items, we might also describe this data set as a web of interconnected things, which is just another way of saying a graph.

    On maps, cities are frequently represented by circles, and the roads that connect these are represented by lines. On an organizational chart (org chart), a circle usually represents a person, normally with an associated title, and lines that connect these

    Enjoying the preview?
    Page 1 of 1