0% found this document useful (0 votes)
22 views

Graph DB

This document summarizes the types of products that support graph processing and their positioning in the market. There are five main types: 1) relational databases that support graphs, 2) relational vendors adopting graph languages, 3) multi-model databases, 4) property graphs, and 5) RDF graphs. It discusses trends in the graph database market and changes among vendors, including growth, adoption of standards, migration tools, and a focus on performance and scalability.

Uploaded by

1977am
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Graph DB

This document summarizes the types of products that support graph processing and their positioning in the market. There are five main types: 1) relational databases that support graphs, 2) relational vendors adopting graph languages, 3) multi-model databases, 4) property graphs, and 5) RDF graphs. It discusses trends in the graph database market and changes among vendors, including growth, adoption of standards, migration tools, and a focus on performance and scalability.

Uploaded by

1977am
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Market segmentation

MarketUpdate
There are five types of products that are, or might some suppliers focusing on this for its own sake
be, directed at the graph space. Firstly, there are and others that see it primarily as an enabler for
relational databases that support graph processing. data virtualisation (or vice versa: data virtualisation
A good example is Pivotal Greenplum, which and knowledge graphs are symbiotic). And finally,
while still strictly relational, supports a variety of there are Sparsity Technologies and Grakn, where
parallelised graph algorithms. However, there are the former focuses on using the Sparksee graph
many graph algorithms that do not benefit from database as an embedded database, for example, in
parallelisation. Secondly, there are relational vendors mobile devices, automobiles and edge devices; and
that have adopted open source graph languages. the latter offers a hypergraph.
Examples here include SAP HANA with support for In the Bullseye diagram we have differentiated
OpenCypher and, in its latest release, IBM Db2 with – via colour coding – between RDF and property
support for Gremlin. The same graph algorithms that graphs, and between those vendors that are
cannot be parallelised – typically those involving focused particularly on graphs and those that are
iterative self-joins – will not perform well when multi-model databases with graph functionality.
using these relational databases as they are not true The positioning of the latter on the Bullseye
graph products but only limited subsets thereof. For diagram relates specifically to their graph
this reason, these offerings are not discussed in this capabilities, rather than their overall capabilities.
Market Update. Sparsity (Sparksee is a property graph) has its own
The third class of product that offers graph colour because it focuses on a different market
processing are multi-model databases, as typified segment from other vendors. We have colour coded
by MarkLogic, Redis, DataStax, and so on. These Grakn as an RDF graph because it is commonly
are all discussed here. What is not considered is used instead of, or to replace, RDF graphs.
Microsoft Cosmos DB. Despite Microsoft’s claims to
the contrary we do not consider this to constitute a Figure 1: The highest scoring companies are nearest the centre. The
true multi-model offering. Specifically, Cosmos DB analyst then defines a benchmark score for a domain leading company
from their overall ratings and all those above that are in the champions
differs from other products in this category because segment. Those that remain are placed in the Innovator or Challenger
it requires a different API for each model, whereas segments, depending on their innovation score. The exact position in
other vendors support the use of a single API across each segment is calculated based on their combined innovation and
overall score. It is important to note that colour coded products have
all supported models. We believe that the use of been scored relative to other products with the same colour coding.
a single API is fundamental to the definition of a
multi-model database as well as to those RDF graph
Champion
vendors, such as Franz and Ontotext, that support a
document model as well as graphs..
Finally, there are property graphs and RDF
graphs. With the advent of RDF* and SPARQL*, which
are proposed standards that allow RDF graphs to
Ontotext
add labels to relationships, as opposed to reification
Neo4j
and other techniques that are either complex or MarkLogic AllegroGraph
result in node proliferation, there are indications DataStax Grakn
Stardog
that these spaces are moving closer together, with Amazon
Neptune
RDF suppliers adding support for OpenCypher or
Cambridge Semantics
Gremlin to support graph traversal. This will suit
RedisGraph
developers, while leaving the underlying model to Objectivity
Tigergraph
support the semantics that are often favoured by Sparsity
Memgraph
Cha

information architects.
tor

With respect to segmentation, we identified in


lle

va

our last report that there was a distinct difference


no
ng

between vendors focusing on analytics as opposed


In
er

to those who are more targeted at operational


environments. Needless to say, there is significant
overlap here. The growth in support for knowledge
graphs has led to even further differentiation, with Property graphs RDF graphs Embedded focus Multi-model

© 2020 Bloor 1
Market trends

MarketUpdate
In 2019 Gartner predicted that the graph database notebooks. Migration tools, both from rival graph
market will be growing at 100% CAGR by 2022. vendors and from relational sources, are becoming
Also last year, Markets and Markets reported an more common. And geospatial support is starting to
estimate for the overall size of the market to be be implemented by several vendors.
$2.8bn by 2024. So we are pleased to see that other Then there is the question of languages. Both
organisations are endorsing our continued research openCypher and Gremlin have significant support
into the graph database market, this being the 4th while the development of GQL as an ANSI standard
Edition of this report. language for property graphs continues and is
On the vendor front, there have been two supported be a variety of vendors. However, we
significant changes since our last Market Update. suspect that these efforts, though worthy, may be
The first is that, predictably, SAP is focusing on made (largely) irrelevant by companies introducing
HANA as its graph offering rather than OrientDB. support for graphical user interfaces and GraphQL
The latter has therefore been omitted from this that takes away the pain of learning new languages,
report. The second change is that Objectivity, while as discussed above.
still marketing ThingSpan, is focusing more on its Finally, as always, there is an ongoing focus on
underlying object-oriented database with ThingSpan performance and scalability. The latter seems to have
being simply an implementation option for relevant been a major area of development by a number of
use cases. DataStax is taking a similar approach with companies, particularly Neo4j and Franz (AllegroGraph),
DSE. It should be noted that this in no way detracts since our last report, while Cambridge Semantics,
from their respective offerings: it is simply that they with its analytics and “graph OLAP” capabilities is even
see more use cases where graph is part of the answer positioning AnzoGraph as a graph data warehouse.
but not all of it. We spent a significant part of our last Market
More generally, there is the wholesale adoption Update discussing benchmarks, something to be
of RDF* and SPARQL* by RDF vendors such as treated with a large pinch of salt. We do not intend
Ontotext, Cambridge Semantics and Stardog, though to repeat this so interested readers are referred to
not by Amazon, even though it does support Gremlin the 3rd Edition (this is the 4th) of this report. We
as a graph traversal language. The advantage of should add, as an example of how witless some
this approach, whereby you can have both SPARQL marketing people are, that in our research for this
and either Gremlin or openCypher running against Market Update we had one company extolling to us
the same database, is that you don’t have to choose how good their product was at one-hop queries!
which underlying storage engine to use. Of course,
Neo4j supports SPARQL also (and Gremlin) but if Knowledge Graphs
you haven’t got a semantic model underneath that is Knowledge graphs deserve special mention, as
going to be of limited value. You also won’t get the they are becoming an increasing area of focus.
inferencing capabilities that an RDF graph provides. Unfortunately, there is no agreed definition of a
Another significant trend is in making graph knowledge graph. What they essentially allow you
application and query development simpler. For to do is to visualise and explore networks of related
example, multiple vendors are now supporting things. But this is precisely what graphs are, so some
GraphQL as an API so that you don’t need to know authorities qualify this by saying “things of interest”.
SPARQL (for RDF graphs) to build your queries. The problem with this suggestion is that often the
In a similar fashion, TigerGraph has introduced a whole point is that you don’t know what is of interest
no-code graphical development environment which until you start your exploration. In any case, even if
hides the complexities of GSQL from business you can sensibly filter out entities and relationships
analysts and other users that want self-service that are not of interest, what you end up with is a
query capabilities. We expect this trend towards (sub-)graph. Perhaps it would be better to describe
ease of use and self-service to grow and expand. a knowledge graph as an interactive “view” of your
There are some other trends that are starting broader graph, in the same sense that you have
to emerge. There is obviously the shift to cloud- views into your relational database.
based provisioning and increased support for Whatever they are actually, there is no doubt that
managed graph databases as a service. There is knowledge graphs are increasingly popular and a
also increasing support for Zeppelin and Jupyter number of graph database providers (particularly those

© 2020 Bloor 2
MarketUpdate
with RDF graphs) are targeting their construction. Features – measures additional capabilities such
This has resulted in various additional terminologies. as whether a property graph includes labels or
For example, so-called identity graphs, which support whether an RDF database has been extended to
functions such as recommendations; and entity-event support properties and RDF*. Also includes facilities
knowledge graphs, which are structured to emphasise such as specialised importing capabilities from
temporally contextualised events and the entities they relational or other environments. The ability to
relate to (for example, this person took this medication track how a graph has changed over time will also
at this time). be useful in some instances. More general features
More generally, knowledge graphs are not just include high availability, security and so forth.
being used for their own sake, but also to support
Integration – how well the product extends beyond
the creation and reuse of training data for machine
graphs per se. For example, support for text (JSON,
learning purposes, and for data virtualisation. In
XML) processing, integration with search engines,
the case of the former, there are two major reasons
and semantics. Also, the ability to leverage (geo-)
to leverage knowledge graphs. Firstly, the actual
spatial data. Support for data virtualisation is
creation of training data is facilitated when you
relevant in this category as is integration with
understand the relationships that exist between the
Jupyter and Zeppelin notebooks. The ability to
data elements you are exploring. Secondly, one of
integrate with third party visualisation tools is also
the drawbacks of the usual data science process is
relevant or, in some cases, vendors provide their
that the data, and its relationships, are collected in
own tools. Integration with traditional BI tools such
order to support the task at hand and then, once the
as Tableau is a bonus.
development process is complete, get thrown away
for lack of anywhere to support its storage. Graph Language – what is the extent of language
databases enable this and so support reuse. support? SPARQL and OWL in the case of RDF, and
As far as data virtualisation is concerned, it is openCypher, Gremlin or other in the case of property
worth commenting that in the general-purpose data graphs. Also including extensions to support RDF*
warehousing market support for data virtualisation, and SPARQL*. Support for other language bindings is
or at least some form of query federation, is now important as well as is GraphQL capability. Further,
more or less table stakes. And then there are there is an overlap with ease of use with respect to
independent offerings in the space from Denodo and the provision of IDEs that hide the complexities of
TIBCO (Composite) as well as (Starburst) Presto. So the underlying language.
this is a crowded market. That said, graph databases
Operations – the extent to which the product
have an intrinsic benefit when it comes to data
supports operational capabilities, including ACID
virtualisation in that they can map the relationships
compliance and immediate consistency. It is worth
that exist between data in different sources.
commenting that almost all products have some

Metrics sort of operational capabilities (just as they do


analytics) but that does not necessarily mean that
We used eight different scoring dimensions. In
they are optimised for that purpose.
alphabetic order these are:
Performance – this covers not just run-time
Analytics – the extent to which the product supports
performance for both operations and analytics but
analytic capabilities, especially complex analytics.
also ingestion rates. While having a “native” graph
For RDF databases the support for inferencing (both
database has theoretical advantages in performance
forward and backward chaining) is relevant. The
terms, everything depends on the implementation.
provision of pre-built graph algorithms will be an
The capabilities of the database optimiser (where
advantage as well as support for third party graph-
appropriate) are relevant here.
based analytic libraries. Also relevant is the ability
to assign probabilities to relationships. Note that all Scalability – not just scale up/out but also scale down/
products have some degree of analytic capability. in. Some products may be fine at the top end but
would not be cost effective for small scale projects,
Ease of Use – should be self-explanatory:
especially if embedded. We should further comment
includes administrative tools, graphical visualisation
that there is a difference between scaling up the
capabilities and so forth. It will be useful if the product
number of user queries (read) that you can support
supports both schema and schema-free environments.
simultaneously, scaling for high availability purposes,
Availability of the product as a managed service will
scaling ingestion (write) and the scale of the graph
also be a factor here as will facilities for supporting the
(number of nodes and edges) that you can support.
creation of knowledge graphs.

© 2020 Bloor 3
Conclusion

MarketUpdate
Some parts of the market are converging: RDF There is no doubt in our minds that graph
graphs adding Gremlin or Cypher; everybody databases are becoming more mainstream and
supporting knowledge graphs and many that there are a broader range of use cases for
advocating data virtualisation; and vendors that which graph databases are being used. We expect
previously offered limited scalability introducing this to continue. While it is encouraging to see
new architectures that support massive distributed vendors such as IBM add graph support in Db2, it
environments, thereby allowing them to compete only goes to validate the market. And while there
more effectively with companies that have are a number of graph algorithms that can be
historically tended to focus on use cases with high- parallelised there are many for which relational
end scale and performance requirements. As far as databases cannot easily achieve adequate
this last point is concerned, this suggests that at performance, largely thanks to the iterative
least some suppliers have reached a point at which (self-joining) nature of many graph queries. We
their products could be described as mature. therefore think that the support of graphs by the
On the other hand, some smaller vendors are likes of Oracle, IBM and SAP is only nibbling at
focusing on particular market segments that are the problem around the edges and that true graph
not well addressed by the major players. In this databases are much to be preferred in all but
last category we would put Sparsity Technologies limited instances.
(embedded graph databases in, for example, smart
cars), Memgraph (which is focusing on extremely
complex environments, often where multiple
graph algorithms have to be used in conjunction,
for instance in managing chemical plants or gas
distribution networks), and possibly Grakn as a
platform for building cognitive applications.

© 2020 Bloor 4
MarketUpdate
About the authors
PHILIP HOWARD
Research Director:
Information Management

hilip started in the computer include (but are not limited to) databases
P industry way back in 1973 and
has variously worked as a systems
and data warehousing, data integration,
data quality, master data management,
analyst, programmer and salesperson, data governance, data migration, metadata
as well as in marketing and product management, and data preparation and
management, for a variety of companies analytics.
including GEC Marconi, GPT, Philips Data In addition to the numerous reports
Systems, Raytheon and NCR. Philip has written on behalf of Bloor
After a quarter of a century of not being Research, Philip was previously editor of both
his own boss Philip set up his own company Application Development News and Operating
in 1992 and his first client was Bloor System News on behalf of Cambridge Market
Research (then ButlerBloor), with Philip Intelligence (CMI). He has also contributed
working for the company as an associate to various magazines and written a number
analyst. His relationship with Bloor of reports published by companies such as
Research has continued since that time and CMI and The Financial Times. Philip speaks
he is now Research Director, focused on regularly at conferences and other events
Information Management. throughout Europe and North America.
Information management includes Away from work, Philip’s primary leisure
anything that refers to the management, activities are canal boats, skiing, playing
movement, governance and storage of data, Bridge (at which he is a Life Master), and
as well as access to and analysis of that dining out.
data. It involves diverse technologies that

DANIEL HOWARD
Senior Analyst:
Information Management and DevOps

aniel started in the IT industry afterward, Daniel left IPL to work for Bloor
D relatively recently, in only 2014.
Following the completion of his
Research as a researcher and the rest (so far,
at least) is history.
Masters in Mathematics at the University of Daniel primarily (although by no means
Bath, he started working as a developer and exclusively) works alongside his father,
tester at IPL (now part of Civica Group). His providing technical expertise, insight and
work there included all manner of software the 'on-the-ground' perspective of a (former)
and web development and testing, usually developer, in the form of both verbal
in an Agile environment and usually to a explanation and written articles. His area
high standard, including a stint working at of research is principally DevOps, where his
an 'innovation lab' at Nationwide. previous experience can be put to the most
In the summer of 2016, Daniel's father, use, but he is increasingly branching into
Philip Howard, approached him with a piece related areas.
of work that he thought would be enriched Outside of work, Daniel enjoys latin
by the development and testing experience and ballroom dancing, skiing, cooking and
that Daniel could bring to the table. Shortly playing the guitar.

© 2020 Bloor 5
MarketUpdate
Bloor overview
Technology is enabling rapid business evolution. The opportunities are immense
but if you do not adapt then you will not survive. So in the age of Mutable business
Evolution is Essential to your success.
We’ll show you the future and help you deliver it.
Bloor brings fresh technological thinking to help you navigate complex business situations,
converting challenges into new opportunities for real growth, profitability and impact.
We provide actionable strategic insight through our innovative independent
technology research, advisory and consulting services. We assist companies
throughout their transformation journeys to stay relevant, bringing fresh thinking to
complex business situations and turning challenges into new opportunities for real
growth and profitability.
For over 25 years, Bloor has assisted companies to intelligently evolve: by embracing
technology to adjust their strategies and achieve the best possible outcomes. At Bloor,
we will help you challenge assumptions to consistently improve and succeed.

Copyright and disclaimer


This document is copyright ©2020 Bloor. No part of this publication may be
reproduced by any method whatsoever without the prior consent of Bloor Research.
Due to the nature of this material, numerous hardware and software products have been
mentioned by name. In the majority, if not all, of the cases, these product names are
claimed as trademarks by the companies that manufacture the products. It is not Bloor
Research’s intent to claim these names or trademarks as our own. Likewise, company
logos, graphics or screen shots have been reproduced with the consent of the owner and
are subject to that owner’s copyright.
Whilst every care has been taken in the preparation of this document to ensure that
the information is correct, the publishers cannot accept responsibility for any errors or
omissions.

© 2020 Bloor 6
Bloor Research International Ltd
20–22 Wenlock Road
LONDON N1 7GU
United Kingdom

tel: +44 (0)1494 291 992


web: www.Bloorresearch.com
email: [email protected]

You might also like