TigerGraph Buyers Guide Part 2
TigerGraph Buyers Guide Part 2
Graph Databases
Key Considerations in Buying a Graph Database
PART TWO
PART 2 - COMPARING AMAZON NEPTUNE AND TIGERGRAPH CLOUD
Selecting a graph database for cloud deployment
Graph databases are the fastest growing category in all of data management. Since seeing early adoption
by companies including Facebook, Google and LinkedIn, graph has evolved into a mainstream technology
used today by enterprises in every industry across a wide variety of use cases. By organizing data in a
graph format, graph databases overcome the big and complex data challenges that other databases such
as Relational and NoSQL cannot.
Selecting graph software is an important decision which can shape the success of your organization.
Unfortunately buyers often struggle to reconcile the conflicting claims made by different graph software
vendors - these claims are often characterized by misinformation.
Part one of the buyer’s guide included a side-by-side comparison of two leading graph databases,
TigerGraph and Neo4j. Part two of the buyer’s guide is intended to assist you in your buying decision by
providing a side-by-side comparison of two leading graph databases with cloud offerings, TigerGraph and
Amazon Neptune. It includes the following information:
1. Will my graph database as a service continue to serve my needs now and into the future as the
volume and complexity of my data grow?
2. Can my data scale across multiple machines to enable me to analyze growing datasets?
3. What is the performance-to-price ratio of my graph database as a service?
Graph is Fundamental to Machine Learning, AI and Analytics
Graph is quite common as a foundation and enabler in the analytics world. Business people are asking
increasingly complex questions across structured and unstructured data - it often requires blending of
data from multiple sources, multiple business units, and increasingly external data.
Analyzing this at scale is not practical, and in some cases, not possible with traditional database systems.
Graph analysis shows and analyzes the relationships in the data. Processing and computation of the data
requires a distributed, scalable system that can run on the cloud.
Scale-out A true distributed database, with automatic Computation does not scale horizontally. Not a
partitioning, seamless to users. distributed database.
Deep-Link Analytics Complex 5 to 10+ hop queries on Tops out at 3 to 6 hops on medium to large
all sizes of datasets - from small to ultra-large, graphs. Not designed and no capability for
distributed graphs. Runs in-database graph OLAP.
analytics including complex OLAP.
Graph Query Language GSQL. Turing-complete, can express complex SPARQL or Gremlin, but not both at the same
graph computations and analytics natively, for time. SPARQL is for RDF data. Gremlin is for
ad hoc queries and complex, parameterized property graphs - Turing-complete but advanced
procedures. TigerGraph is an active contributor programming skills are needed for asking com-
to the upcoming GQL standard. plex questions and solving real-life business
problems. Less intuitive than GSQL or other
language alternatives.
Graph Algorithm Library Open source, user extensible and customizable. None (lacks even Gremlin’s modest algorithm
Runs within the database. library).
Visual Interface GraphStudio for full workflow: Neptune offers visualization via partners which
visual modeling, ETL, exploration, and query come with add-on costs.
development. AdminPortal for monitoring and
management.
Standard APIs Industry standards: REST APIs, JSON output, Wide range of options in API support.
JDBC, Python, Spark.
Cloud Offering - Graph Fully-managed, cloud-based graph database. Fully-managed, cloud-based graph database,
Database as a Service No cloud vendor lock-in. Free tier for lifetime for available on AWS. Vendor lock-in: users cannot
non-commercial usage. Contains 18+ starter move on-premises or switch cloud providers.
kits across popular use cases.
Design y C++ core engine Does not distribute data, so “horizontal scaling”
is just making a replica. Vertically scalable
y Native distributed graph storage system that relies on expensive machines (with
y Massively parallel processing lots of RAM), and data replication for higher
throughput
y Compressed data
y Schema-first design optimizes query
performance
Developer Community Rapidly growing developer community. Small developer community, limited resources
and tutorials.
Performance-to-Price TigerGraph is a cost-effective graph solution (as The cost per query time for Amazon Neptune
Ratio demonstrated by a benchmarking test). is 2.6 times higher than that of TigerGraph at
least, and can be even as high as 9.7 times
more costly (for a three-hop path query, the
best case scenario for Amazon Neptune in a
benchmarking test).
Here are a few examples of customers who have upgraded to TigerGraph due to higher performance and
scalability, more functionality and lower total cost of ownership (TCO). TigerGraph is happy to connect
graph database buyers with these and other customers who can share additional details.
CUSTOMER - Despite the fact that they are a loyal AWS customer, they evaluated Nep-
Large US Financial Services tune and found it to be lacking on key performance requirements: data
Payment Processor ingestion, speed, graph analytics query response time
- Additionally, TigerGraph was determined to be architecturally superior -
USE CASE the only native parallel graph database
Fraud Detection
“We are impressed by TigerGraph’s built-in massive parallel processing
architecture, unique vertices optimization for storage and indexing,
ability to support ACID for both OLTP and OLAP queries, and superb
performance/scaling for complete deep traversal queries, and its
developer-focus and hunger for growth.” - Senior Architect
CUSTOMER - Unable to scale their cybersecurity services with their existing SQL
Cybersecurity Company Server
- Tested Neptune as an alternative, but it was unable to meet their perfor-
USE CASE mance requirements
Knowledge Graph with Ma- - Harnessing graph technology to continuously update and expand its
chine Learning knowledge of URL classifications and risk scores in the face of rapid
URL expansion, and identify new cyber threats at scale with real-time
analytics
CUSTOMER - Needed a way to identify specific patterns across purchase orders to
Cloud based supply manage- accelerate order fulfillment and improve efficiencies
ment software company - Attempted to solve their business challenge with Neptune, but they ran
into significant performance challenges and queries that simply wouldn’t
USE CASE return, so they turned to TigerGraph
Pattern Matching, Supply “We’ve been misled by a number of graph database companies but
Chain Management
TigerGraph is as advertised” - Data Sciences Engineer
CUSTOMER - Prior to selecting TigerGraph, the customer conducted its own in-house
Innovative Media Company benchmarks based on its requirements and thoroughly compared all
based in Germany available systems
- With the shortlist decided, the customer then built prototypes and
USE CASE performed more detailed performance tests. Despite the ubiquity of
Recommendation Engine, AWS in their stack, the company chose TigerGraph for its powerful
Customer 360 performance.
“TigerGraph provides a scalable and high-performance graph
database platform,” says the customer. The integration has proven
straightforward and the flexibility of the GSQL environment makes
it much easier for developers who are not yet Graph specialists to
quickly get involved in our production processes.” - CEO
Distributed Yes No
Database
Storage efficiency Typically compresses raw data down Typically expands raw data to ~400% of
to 50% of original size. original size.
Scalable Compute Scale-up or scale-out. Users can both Scale-up only: cannot distribute a query
use more powerful machines AND across multiple machines.
increase the number of machines.
Summary Distributed, replicated complete data- Scalable, replicated storage with read
base supports both replicas supports
y high transaction throughput y high transaction throughput
(OLTP) AND (OLTP) only
y analyzing massive, growing y Not suited for analytics of large
datasets (OLAP). datasets.
OLAP: Deep-Link Handles deep-link (3 to 10+ hops) on Tops out at 3 to 6 hops on medium
Analytics ultra-large, distributed graphs. to large graphs. Not designed and no
Runs in-database large graphs. capability for OLAP.
Graph Query GSQL. Turing-complete, can express Gremlin. Turing-complete but does
Language complex graph computations and an- not offer the same ease of use as
alytics natively, for ad hoc queries and GSQL - advanced programming skills
complex, parameterized procedures. are needed for asking complex ques-
Excels at analytics due to built-in par- tions and solving real-life business
allelism and innovative accumulators. problems. Less intuitive to learn than
TIgerGraph is an active contributor to GSQL or other language alternatives.
the upcoming GQL standard.
Transactions and ACID across an entire cluster. ACID-compliant.
Cluster Strong consistency.
Consistency
Graph Algorithm Open source, user extensible and cus- None (lacks even Gremlin’s modest
Library tomizable. Runs within the database. algorithm library).
Visual Interface GraphStudio for full workflow: Neptune offers visualization via part-
visual modeling, ETL, exploration, and ners which come with add-on costs.
query development.
1) Computing costs:
TigerGraph stores data more efficiently than any other graph database on the market: Neptune typically
needs 8 times more disk storage for the same input graph data. Unlike TigerGraph, which compresses raw
data when loaded into a graph, Neptune typically expands it. The following table compares how Tiger-
Graph and Amazon Neptune store 1 GB of input data:
Source: Benchmarking Graph Analytic Systems: TigerGraph, Neo4j, Neptune, JanusGraph, and ArangoDB
Both TigerGraph Cloud and Amazon Neptune are tuned to run well when the graph can be loaded into
memory. TigerGraph Cloud, however, is a more economical option - a TigerGraph instance with X CPUs
and Y RAM costs approximately the same as an Amazon Neptune instance with X CPUs and 8Y RAM.
Additionally, TigerGraph Cloud does not charge for I/O whereas Neptune does.
2) Computing efficiency:
Amazon Neptune’s documentation emphasizes that their Gremlin implementation is for graph traversal,
without mention of computation or analytics. Nevertheless, on benchmark tests which ran a stream of
graph traversals (a strong point for Neptune), Neptune was 5.5 times slower than TigerGraph at a mini-
mum and, in some instances, didn’t complete queries at all. For example, Neptune required 2.27 seconds
to complete a three-hop path query, while TigerGraph required only 0.41 seconds. TigerGraph’s faster
execution helps in maintaining higher QPS (Query per Second) for sustained higher performance.
The combination of the differences in computing costs and compute efficiency are encapsulated in the
following figure:
The figure shows that cost per query for Amazon Neptune is three times higher than that of TigerGraph
at least, and can even be 10 times more costly. This demonstrates that the performance-to-price ratio of
TigerGraph Cloud is dramatically better than that of Amazon Neptune, even assuming the smallest perfor-
mance difference (three-hop path query).
CUSTOMER FEEDBACK
y Customer News: Innovative media company based in Germany upgrades to TigerGraph