0% found this document useful (0 votes)

65 views11 pages

Dgraph: Synchronously Replicated, Transactional and Distributed Graph Database

Dgraph is a distributed graph database that provides horizontal scalability, distributed ACID transactions, low-latency arbitrary-depth joins, synchronous replication for high availability, and resilience to crashes. It shards and stores data in a way that optimizes joins and traversals, while still enabling fast data retrieval and aggregation. Dgraph uses relationship-based sharding which allows it to execute joins and traversals in a single network call, regardless of cluster size or result set size.

Uploaded by

Unown 131

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views11 pages

Dgraph: Synchronously Replicated, Transactional and Distributed Graph Database

Uploaded by

Unown 131

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Dgraph: Synchronously Replicated, Transactional and Distributed Graph

Database

Manish Jain
[email protected]

Dgraph Labs, Inc.

Version: 0.8 Last Updated: March 1, 2021

Abstract Dgraph solves the join depth problem with a unique shard-
ing mechanism. Instead of sharding by entities, as most sys-
Dgraph is a distributed graph database which provides hori- tems do, Dgraph shards by relationships. Dgraph’s unique
zontal scalability, distributed cluster-wide ACID transactions, way of sharding data is inspired by research at Google [?],
low-latency arbitrary-depth joins, synchronous replication, which shows that the overall latency of a query is greater than
high availability and crash resilience. Aimed at real-time trans- the latency of the slowest component. The more servers a
actional workloads, Dgraph shards and stores data in a way query touches to execute, the slower the query latency would
to optimize joins and traversals, while still providing data be. By doing relationship based sharding, Dgraph can execute
retrieval and aggregation. Dgraph’s unique take is to provide a join or traversal in a single network call (with a backup
low-latency arbitrary-depth joins in a constant number of net- network call to replica if the first is slow), irrespective of the
work calls (typically, just one network call) that would be size of the cluster or the input set of entities. Dgraph executes
required to execute a single join, irrespective of the size of arbitrary-depth joins without network broadcasts or collecting
the cluster or the size of the result set. data in a central place. This allows the queries to be fast and
latencies to be low and predictable.
1 Introduction
2 Dgraph Architecture
Distributed systems or databases tend to suffer from join depth
problem. That is, as the number of traversals of relationships Dgraph consists of Zeros and Alphas, each representing a
increase within a query, the number of network calls required group that they are serving. Zeros serve group zero and Alphas
(in a sufficiently sharded dataset) increase. This is typically serve group one, group two and onwards. Each group forms
due to entity-based data sharding, where entities are randomly a Raft cluster of 1, 3 or 5 members configurable by a human
(sometimes with a heuristic) distributed across servers con- operator (henceforth, referred to as the operator). All updates
taining all the relationships and attributes along with them. made to the group are serialized via Raft consensus algorithm
This approach suffers from high-fanout result set in interme- and applied in that order to the leader and followers.
diate steps of a graph query causing them to do a broadcast Zeros store and propagate metadata about the cluster while
across the cluster to perform joins on the entities. Thus, a sin- Alphas store user data. In particular, Zeros are responsible for
gle graph query results in network broadcasts, hence causing membership information, which keeps track of the group each
a jump in the query latency as the cluster grows. Alpha server is serving, its internal IP address for communi-
Dgraph is a distributed database with a native graph back- cation within the cluster, the shards it is serving, etc. Zeros do
end. It is the only native graph database to be horizontally not keep track of the health of the Alphas and take actions on
scalable and support full ACID-compliant cluster-wide dis- them – that is considered the job of the operator. Using this
tributed transactions. In fact, Dgraph is the first graph database information, Zero can tell the new Alpha to either join and
to have been Jepsen [?] tested for transactional consistency. serve an existing group, or form a new group.
Dgraph automatically shards data into machines, as the The membership information is streamed out from Zero to
amount of data or the number of servers change, and auto- all the Alphas. Alphas can use this membership information
matically reshards data to move it across servers to balance to route queries (or mutations) which hit the cluster. Every
the load. It also supports synchronous replication backed instance in the cluster forms a connection with every other
by Raft [?] protocol, which allows the queries to seamlessly instance (thus forming 2 × (N2 ) open connections, where N
failover to provide high availability. = number of Dgraph instances in the cluster), however, the

1
protocol buffer [?] data format and not interchanged among
the two.

{
"uid" : "0xab",
"type" : "Astronaut",
"name" : "Mark Watney",
"birth" : "2005/01/02",
"follower": { "uid": "0xbc", ... },
}

<0xab> <type> "Astronaut" .

<0xab> <name> "Mark Watney" .
<0xab> <birth> "2005/01/02" .
<0xab> <follower> <0xbc> .

A triple is typically expressed as a subject-predicate-object

Figure 1: Dgraph Architecture: There is one Zero group and or a subject-predicate-value. Subject is a node, predicate is a
multiple Alpha groups. Each group is a Raft group consisting relationship, and object can be another node or a primitive data
of one or more members. type. One points from a node to another node, the other points
from a node to a value. In the above example, the triple with
name is a type of subject-predicate-value (typically referred
usage of this connection depends on their relationship. For
to as an attribute), while the triple with follower is a type of
example, a Raft leader-follower relationship would have heart-
subject-predicate-object. Dgraph makes no difference in how
beats (every 100 ms) and data flowing, while an Alpha would
it handles these two types of records (to avoid confusion over
only talk to Alpha in another group when it needs to do so
these two types, we’ll refer to them as object-values). Dgraph
for processing queries or mutations. Every open connection
considers this as the unit of record and a typical JSON map
does have light-weight health checks to avoid stalling on a tar-
would be broken into multiple such records.
get server which has become unresponsive (died, partitioned,
Data can be retrieved from Dgraph using GraphQL [?]
etc.). Both Alphas and Zeros expose one port for intra-cluster
and a modified version of GraphQL, called GraphQL+- [?].
communication over Grpc [?] and one for external commu-
GraphQL+- has most of the same properties as GraphQL. But,
nication with clients over HTTP. Alphas additionally expose
adds various properties which are important for a database,
an external Grpc port for communication with Grpc based
like query variables, functions and blocks. More information
clients – all official clients run over Grpc.
about how the query language came to be and the differences
Zero also runs an oracle which hands out monotonically-
between GraphQL and GraphQL+- can be found in this blog
increasing logical timestamps for transactions in the cluster
post [?].
(no relation to system time). A Zero leader would typically
As mentioned in section ??, all internal and external com-
lease out a bandwidth of timestamps upfront via Raft proposal
munication in Dgraph runs via Grpc and Protocol Buffers.
and then service timestamp requests strictly from memory
Dgraph also exposes HTTP endpoints to allow building client
without any further coordination. Zero oracle tracks additional
libraries in languages which are not supported by these two.
things for aiding with transaction commits, which would be
There is a functionality parity between HTTP endpoints and
elaborated in section ??.
APIs exposed via Grpc.
Zero gets information about the size of data in each group
In accordance with the GraphQL spec, query responses
from the Alpha leaders, which it uses to make decisions about
from Dgraph are in JSON format, both over HTTP and Grpc.
shard movement, which would be elaborated in section ??.

2.2 Data Storage

2.1 Data Format
Dgraph data is stored in an embeddable key-value database
Dgraph can input data in a JSON format or (slightly modified) called Badger [?] for data input-output on disk. Badger is
RDF NQuad format. Dgraph would break down a JSON map an LSM-tree based design, but differs from others in how it
into smaller chunks, with each JSON key-value forming one can optionally store values separately from keys to generate
record equivalent of a single RDF triple record. When parsing a much smaller LSM tree, which results in both lower write
RDF Triple or JSON, data is directly converted into an internal and read amplification. Various benchmarks run by the team

2
show Badger to provide equivalent or faster writes than other
LSM based DBs, while providing equivalent read latencies
compared to B+-tree based DBs (which tend to provide much
faster reads than LSM trees).
As mentioned above, all records with the same predicate
form one shard. Within a shard, records sharing the same
subject-predicate are grouped and condensed into one single
key-value pair in Badger. This value is referred to as a posting
list, a terminology commonly used in search engines to refer
to a sorted list of doc ids containing a search term. A posting
list is stored as a value in Badger, with the key being derived
from subject and predicate.

<0x01> <follower> <0xab> .

<0x01> <follower> <0xbc> .
<0x01> <follower> <0xcd> .
...
key = <follower, 0x01>
value = <0xab, 0xbc, 0xcd, ...>

All subjects in Dgraph are assigned a globally unique id,

Figure 2: Posting list structure stored in group varint-encoded
called a uid . A uid is stored as a 64-bit unsigned integer
blocks
(uint64) to allow efficient, native treatment by Go language
in the code base. Zero is responsible for handing out uids as
needed by the Alphas and does it in the same monotonically
increasing fashion as timestamps (section ??). A uid once
of performance. Work is going on currently to use Roaring
allocated is never reallocated or reassigned. Thus, every node
Bitmaps [?] instead to represent this data.
in the graph can be referenced by a unique integer.
Object-values are stored in postings. Each posting has an Thanks to these techniques, a single edge traversal corre-
integer id. When the posting holds an object, the id is the sponds to only a single Badger lookup. For example, finding
uid assigned to that object. When posting holds a value, the a list of all of X’s followers would involve doing a lookup
integer id for value is determined based upon the schema on <follower, X> key which would give a posting list con-
of the predicate. If the predicate allows multiple values, the taining all of their followers’ uids . Further lookups can be
integer id for the value would be a fingerprint of the value. If made to get a list of posts made by followers . Common fol-
the predicate stores values with language, the integer id would lowers between X and Y an be found by doing two lookups
be a fingerprint of the language tag. Otherwise, the integer id followed by intersecting the sorted int lists of <follower,
would be set to maximum possible uint64 (264 - 1). Both uid X> and <follower, Y>. Note that distributed joins and (ob-
and integer id is never set to zero. ject based) traversals only require uids to be transmitted over
Value could be one of the many supported data types: int, network, which is also very efficient. All this allows Dgraph
float, string, datetime, geo, etc. The data is converted into bi- to be very efficient on these operations, without compromis-
nary format and stored in a posting along with the information ing on the typical select * from table where X=Y style
about the original type. A posting can also hold facets. Facets record lookups.
are key-value labels on an edge, treated like attachments.
In a common case where the predicate only has objects This type of data storage has benefits in joins and traversals,
(and no values like follower edge), a posting list would consist but comes with an additional problem of high fan-out. If there
largely of sorted uids . These are optimized by doing integer are too many records with the same <subject, predicate>,
compression. The uids are grouped in blocks of 256 integers the overall posting list could grow to an untenable size. This
(configurable), where each block has a base uid and a binary is typically only a problem for objects (not so much for val-
blob. The blob is generated by taking a difference of current ues). We solve this by binary splitting a posting list as soon
uid with the last and storing the difference in bytes encoded as its on-disk size hits a certain threshold. A split posting list
using group varint. This generates a data compression ratio would be stored as multiple keys in Badger, with optimiza-
of 10. When doing intersections, we can use these blocks to tions made to avoid retrieving the splits until the operation
do binary searches or block jumps to avoid decoding all the needs them. Despite storage differences, the posting list con-
blocks. Sorted integer encoding is a hotly researched topic tinues to provide the same sorted iteration via APIs as an
and there is a lot of room for optimization here in terms unsplit list.

3
2.3 Data Sharding
While Dgraph shares a lot of features of NoSQL and dis-
tributed SQL databases, it is quite different in how it handles
its records. In other databases, a row or document would be
the smallest unit of storage (guaranteed to be located together),
while sharding could be as simple as generating equal sized
chunks consisting of many of these records.
Dgraph’s smallest unit of record is a triple (subject-
predicate-object, described below), with each predicate in
its entirety forming a shard. In other words, Dgraph logically
groups all the triples with the same predicate and considers
them one shard. Each shard is then assigned a group (1..N)
which can then be served by all the Alphas serving that group,
as explained in section ??.
This data sharding model allows Dgraph to execute a com-
plete join in a single network call and without any data fetch-
ing across servers by the caller. This combined with grouping
of records in a unique way on disk to convert operations which
would typically be executed by expensive disk iterations, into
fewer, cheaper disk seeks makes Dgraph internal working
quite efficient.
To elaborate this further, consider a dataset which contains Figure 3: Data sharding
information about where people live (predicate: "lives-in")
and what they eat (predicate: "eats"). Data might look some-
thing like this: 2.4 Data Rebalancing
As explained above, each shard contains a whole predicate
in its entirety which means Dgraph shards can be of uneven
<person-a> <lives-in> <sf> . size. The shards not only contain the original data, but also
<person-a> <eats> <sushi> . all of their indices. Dgraph groups contain many shards, so
<person-a> <eats> <indian> . the groups can also be of uneven size. The group and shard
... sizes are periodically communicated to Zero. Zero uses this
<person-b> <lives-in> <nyc> . information to try to achieve a balance among groups, using
<person-b> <eats> <thai> . heuristics. Current one being used is just data size, with the
idea that equal sized groups would allow similar resource
usage across servers serving those groups. Other heuristics,
particularly around query traffic, could be added later.
In this case, we’ll have two shards: lives-in and eats. As- To achieve balance, Zero would move shards from one
suming the worst case scenario where the cluster is so big that group to another. It does so by marking the shard read-only,
each shard lives on a separate server. For a query which asks then asking the source group to iterate over the underlying key-
for [people who live in SF and eat Sushi], Dgraph values concurrently and streaming them over to the leader of
would execute one network call to server containing lives- the destination group. The destination group leader proposes
in and do a single lookup for all the people who live in these key-values via Raft, gaining all the correctness that
SF (* <lives-in> <sf>). In the second step, it would take comes with it. Once all the proposals have been successfully
those results and send them over to server containing eats, applied by the destination group, Zero would mark the shard
do a single lookup to get all the people who eat Sushi (* as being served by the destination group. Zero would then
<eats> <sushi>), and intersect with the previous step’s re- tell source group to delete the shard from its storage, thus
sultset to generate the final list of people from SF who eat finalizing the process.
Sushi. In a similar fashion, this result set can then be further While this process sounds pretty straighforward, there are
filtered/joined, each join executing in one network call. many race and edge conditions here which can cause transac-
As we learnt in section ??, the result set is a list of sorted tional correctness to be violated as shown by Jepsen tests [?].
64-bit unsigned integers, which make the retrieval and inter- We’ll showcase some of these violations here:
section operations very efficient. 1. A violation can occur when a slightly behind Alpha

4
server would think that it is still serving the shard (despite the // tokens shouldn’t be encoded with the byte
shard having moved to another group) and allow mutations // identifier.
to be run on itself. To avoid this, all transactions states keep Tokens(interface{}) ([]string, error)
the shard and the group info for the writes (along with their
conflict keys as we’ll see in section ??). The shard-group // Identifier returns the prefix byte for this
information is then checked by Zero to ensure that what the // token type. This should be unique. The range
transaction observes (via Alpha it talked to) and what Zero // 0x80 to 0xff (inclusive) is reserved for
has is the same – a mismatch would cause a transaction abort. // user-provided custom tokenizers.
2. Another violation happens when a transaction commits Identifier() byte
after the shard was put into read-only mode – this would cause
that commit to be ignored during the shard transfer. Zero // IsSortable returns true if the tokenizer can
catches this by assigning a timestamp to the move operation. // be used for sorting/ordering.
Any commits (on this shard) at a higher timestamp would be IsSortable() bool
aborted, until the shard move has completed and the shard is
brought back to the read-write mode. // IsLossy() returns true if we don’t store the
3. Yet another violation can occur when the destination // values directly as index keys during
group receives a read below the move timestamp, or a source // tokenization. If a predicate is tokenized
group receives a read after it has deleted the shard. In both // using a lossy tokenizer, we need to fetch
cases, no data exists which can cause the reads to incorrectly // the actual value and compare.
return back nil values. Dgraph avoids this by informing the IsLossy() bool
destination group of the move timestamp, which it can use }
to reject any reads for that shard below it. Similarly, Zero
includes a membership mark at which the source Alpha must Every tokenizer has a globally unique identifier
reach before the group can delete the shard, thus, every Alpha (Identifier() byte), including custom tokenizers pro-
member of the group would know that it is no longer servig vided by operators. The tokens generated are prefixed with a
the data before deleting it. tokenizer identifier to be able to traverse through all tokens
Overall, the mechanism of membership information syn- belonging to only that tokenizer. This is useful when doing
chronization during a shard move proved the hardest to get iteration for inequality queries (greater than, less than, etc.).
right with respect to transactional correctness. Note that inequality queries can only be done if a tokenizer is
sortable (IsSortable() bool). For example, in strings, an
exact index is sortable, but a hash index is not.
3 Indexing Depending upon which index a predicate has set in the
schema, every mutation in that predicate would invoke one
Dgraph is designed to be a primary database for applications.
or more of these tokenizers to generate the tokens. Note that
As such, it supports most of the commonly needed indices. In
indices only operate on values, not objects. A set of tokens
particular, for strings, it supports regular expressions, full-text
would be generated with the before mutation value and an-
search, term matching, exact and hash matching index. For
other set with the after mutation value. Mutations would be
datetime, it supports year, month, day and hour level indices.
added to delete the subject uid from the posting lists of before
For geo, it supports nearby, within, etc. operations, and so
tokens and to add the subject uid to the after tokens.
on...
Note that all indices have object values, so they largely deal
All these indices are stored by Dgraph using the same post-
only in uids. Indices in particular can suffer from high fan-out
ing list format described above. The difference between an
problem and are solved using posting list splits described in
index and data is the key. A data key is typically <predicate,
the section ??.
uid>, while an index key is <predicate, token>. A token
is derived from the value of the data, using an index tokenizer.
Each index tokenizer supports this interface: 4 Multiple Version Concurrency Control
type Tokenizer interface { As described in section ??, data is stored in posting list format,
Name() string which consists of postings sorted by integer ids. All posting
list writes are stored as deltas to Badger on commit, using the
// Type returns the string representation of commit timestamp. Note that timestamps are monotonically
// the typeID that we care about. increasing globally across the DB, so any future commits are
Type() string guaranteed to have a higher timestamp.
It is not possible to update this list in-place, for multiple
// Tokens return tokens for a given value. The reasons. One is that Badger (and most LSM trees) writes are

5
immutable, which plays very well with filesystems and rsync.
Second is that adding an entry within a sorted list requires
moving following entries, which depending upon the position
of the entry can be expensive. Third, as the posting list grows,
we want to avoid rewriting a large value every time a mutation
happens (for indices, it can happen quite frequently).
Dgraph considers a posting list as a state. Every future
write is then stored as a delta with a higher timestamp. A delta
would typically consist of postings with an operation (set or
delete). To generate a posting list, Badger would iterate the
versions in descending order, starting from the read timestamp,
picking all deltas until it finds the latest state. To run a posting
list iteration, the right postings for a transaction would be
picked, sorted by integer ids, and then merge-sort operation is
run between these delta postings and the underlying posting
list state.
Earlier iterations of this mechanism were aimed at keep-
ing the delta layer sorted by integer ids as well, overlaying it
on top of the state to avoid doing sorting during the reads —
any addition or deletion made would be consolidated based
on what was already in the delta layer and the state. These
iterations proved too complex to maintain for the team and
suffered from hard to find bugs. Ultimately, that concept was
dropped in favor of a simple understandable solution of pick- Figure 4: MVCC
ing the right postings for a read and sorting them before itera-
tion. Additionally, earlier APIs implemented both forward and
backward iteration adding complexity. Over time, it became While designing transactions in Dgraph, we looked at pa-
clear that only forward iteration was required, simplifying the pers from Spanner [?], HBase [?], Percolator [?] and others.
design. Spanner most famously uses atomic clocks to assign times-
There are many benefits in avoiding having to regenerate tamps to transactions. This comes at the cost of lower write
the posting list state on every write. At the same time, as throughput on commodity servers which don’t have GPS
deltas accumulate, the work of list regeneration gets delegated based clock sync mechanism. So, we rejected that idea in fa-
to the readers, which can slow down the reads. To find a vor of having a single Zero server, which can hand out logical
balance and avoid gaining deltas indefinitely, we added a timestamps at a much faster pace.
rollup mechanism. To avoid Zero becoming a single point of failure, we run
Rollups: As keys get read, Dgraph would selectively re- multiple Zero instances forming a Raft group. But, this comes
generate the posting lists which have a minimum number of with a unique challenge of how to do handover in case of
deltas, or haven’t been regenerated for a while. The regener- leader relection. Omid, Reloaded [?] (referenced as Omid2)
ation is done by starting from the latest state, then iterating paper handles this problem by utilizing external system. In
over the deltas in order and merging them with the state. The Omid2, they run a standby timestamp server to take over in
final state is then written back at the latest delta timestamp, re- case the leader fails. This standby server doesn’t need to get
placing the delta and forming a new state. All previous deltas the latest transaction state information, because Omid2 uses
and states for that key can then be discarded to reclaim space. Zookeeper [?], a centralized service for maintaining transac-
This system allows Dgraph to provide MVCC. Each read tion logs. Similarly, TiDB built TiKV, which uses a Raft-based
is operating upon an immutable version of the DB. Newer replication model for the key-values. This allows every write
deltas are being generated at higher timestamps and would be by TiDB to automatically be considered highly-available. Sim-
skipped during a read at a lower timestamp. ilarly, Bigtable [?], uses Google Filesystem [?] for distributed
storage. Thus, no direct information transfer needs to happen
5 Transactions among the multiple servers forming the quorum.
While this concept achieves simplicity in the database, we
Dgraph has a design goal of being simple to operate. As were not entirely thrilled with this idea due to two reasons.
such, one of the goals is to not depend upon any third party One, we had an explicit goal of non-reliance on any third-
system. This proved quite hard to achieve while providing party system to make running Dgraph operationally easier,
high availability for not only data but also transactions. and felt that a solution should be possible without pushing

6
synchronous replication within Badger (storage). Second, we Algorithm 1 Commit (Ts , Keys)
wanted to avoid touching disk unless necessary. By having 1: for each key k ∈ Keys do
Raft be part of the Dgraph process, we can find-tune when 2: if lastCommit (k) > Ts then
things get written to state to achieve better efficiency. In fact, 3: Propose(Ts ← abort )
our implementation of transactions don’t write to DB state on 4: return
disk until they are committed (still written to Raft WAL). 5: end if
We closely looked at HBase papers ( [?], [?]) for other 6: end for
ideas, but they didn’t directly fit our needs. For example, 7: Tc ← GetTimestamps(1)
HBase pushed a lot of transaction information back to the 8: for each key k ∈ Keys do
client, giving them critical information about what they should 9: lastCommit (k) ← Tc
or should not read to maintain the transactional guarantees. 10: end for
This however, makes the client libraries harder to build and 11: Propose(Ts ← Tc )
maintain, something we did not like. On top of that, a graph
query can touch millions of keys in the intermediate steps, it’s
expensive to keep track of all that information and propagate Algorithm 2 Watermark: Calculate DoneUntil (T , isPending)
that to the client. 1: if T ∈
/ MinHeap then
Aim for Dgraph client libraries was to keep as minimal 2: MinHeap ← T
state as possible to allow open-source users unfamiliar with 3: end if
the internals of Dgraph to build and maintain libraries in 4: pending(T ) ← isPending
languages unfamiliar to us (for example, Elixir). 5: curDoneT s ← DoneUntil
// TODO: Do I describe the first iteration? 6: for each minT s ∈ MinHeap.Peek () do
We simply could not find a paper at the time which de- 7: if pending(minT s) then
scribed how to build a simple to understand, highly-available 8: break
transactional system which could be run without assuming 9: end if
that the storage layer is highly available. So, we had to come 10: MinHeap.Pop()
up with a new solution. Our second iteration still faced many 11: curDoneT s ← minT s
issues as proven by Jepsen tests. So, we simplified our second 12: end for
iteration to a third one, which is as follows. 13: DoneUntil ← curDoneT s

5.1 Lock-Free High Availability Transaction simple, Zero does not push to any Alpha leader. It is the job
Processing of (whoever is) the latest Alpha leader to establish an open
Dgraph follows a lock-free transaction model. Each transac- stream from Zero to receive transaction status updates.
tion pursues its course concurrently, never blocking on other Along with the transaction status update, Zero leader also
transactions, while reading the committed data at or below its sends out a MaxAssigned timestamp. MaxAssigned is cal-
start timestamp. As mentioned before, Zero leader maintains culated using a Watermark algorithm ??, which maintains
an Oracle which hands out logical transaction timestamps a min-heap of all allocated timestamps, both start and com-
to Alphas. Oracle also keeps track of a commit map, storing mit timestamps. As consensus is achieved, the timestamps
a conflict key → latest commit timestamp. As shown in al- are marked as done and MaxAssigned gets advanced to the
gorithm ??, every transaction provides the Oracle the list of maximum timestamp up until which everything has achieved
conflict keys, along with the start timestamp of the transac- consensus as needed. Note that start timestamps don’t typi-
tion. Conflict keys are derived from the modified keys, but cally need a consensus (unless lease needs to be updated) and
are not the same. For each write, a conflict key is calculated get marked as done immediately. Commit timestamps always
depending upon the schema. When a transaction requests a need a consensus to ensure that Zero group achieves quorum
commit, Zero would check if any of those keys has a commit on the status of the transaction. This allows a Zero follower
timestamp higher than the start timestamp of the transaction. to become a leader and have full knowledge of transaction
If the condition is met, the transaction is aborted. Otherwise, statuses. This ordering is crucial to achieve the transactional
a new timestamp is leased by the Oracle, set as the commit guarantees as we will see below.
timestamp and conflict keys in the map are updated. Once Alpha leaders receive this update, they would propose
The Zero leader then proposes this status update (commit it to their followers, applying the updates in the same order.
or abort) in the form of a start → commit ts (where commit All Raft proposal applications in Alphas are done serially.
ts = 0 for abort) to the followers and achieves quorum. Once Alphas also have an Oracle, which keeps track of the pending
quorum is achieved, Zero leader streams out this update to transactions. They maintain the start timestamp, along with a
the subscribers, which are Alpha leaders. To keep the design transaction cache which keeps all the updated posting lists in

7
transactions and linearizable reads.
For correctness, only Zero leader is allowed to assign times-
tamps, uids, etc. There are edge cases where Zero followers
would mistakenly think they’re the leaders and serve stale
data — Dgraph does multiple things to avoid these scenarios.
1. If a Zero leadership changes, the new leader would lease
out a range of timestamps higher than the previous leader has
seen. However, an older commit proposal stuck with the older
leader can get forwarded to the new one. This can allow a
commit to happen at an older timestamp, causing failure of
Figure 5: MaxAssigned watermark. Open circles represent transactional guarantees. We avoid this by disallowing Zero
and filled circles represent done. Start timestamps 1, 2, and 4 followers forwarding requests to the leader and rejecting those
are immediately marked as done. Commit timestamp 3 begins proposals.
and must have consensus before it is done. Watermark keeps // TODO: We should have a membership section, which
track of the highest timestamp at and below which everything explains how membership works and is transmitted to Alphas.
is done. 2. Every membership state update streamed from Zero re-
quires a read-quorum (check with Zero peers to find the latest
Raft index update seen by the group). If the Zero is behind
a partition, for example, it wouldn’t be able to achieve this
quorum and send out a membership update. Alphas expect an
update periodically and if they don’t hear from the Zero leader
after a few cycles, they’d consider the Zero leader defunct,
abolish connection and retry to establish connection with a
(potentially different) healthy leader.

6 Consistency Model
Dgraph supports MVCC, Read Snapshots and Distributed
ACID transactions. The transactions are cluster-wide across
universal dataset – not limited by any key level or server
Figure 6: The MaxAssigned system ensures that linearizable
level restrictions. Transactions are also lockless. They don’t
reads. Reads at timestamps higher than the current MaxAs-
block/wait on seeing pending writes by uncommitted trans-
signed (MA) must block to ensure the writes up until the read
actions. They can all proceed concurrently and Zero would
timestamp are applied. Txn 2 receives start ts 3, and a read at
choose to commit or abort them depending on conflicts.
ts 3 must acknowledge any writes up to ts 2.
Considering the expense of tracking all the data read by a
single graph query (could be millions of keys), Dgraph does
memory. On a transaction abort, the cache is simply dropped. not provide Serializable Snapshot Isolation. Instead, Dgraph
On a transaction commit, the posting lists are written to Bad- provides Snapshot Isolation, tracking writes which is a much
ger using the commit timestamp. Finally, the MaxAssigned more contained set than reads.
timestamp is updated. Dgraph hands out monotonically increasing timestamps
Every read or write operation must have a start times- (represented by T ) for transactions (represented by T x).
tamp. When a new query or mutation hits an Alpha, it would Ergo, if any transaction T xi commits before T x j starts, then
T xi Tx
ask Zero to assign a timestamp. This operation is typically Tcommit < Tstartj . Any commit at Tcommit is guaranteed to be
batched to only allow one pending assignment call to Zero seen by a read at timestamp Tread by any client, if Tread >
leader per Alpha. If the start timestamp of a newly received Tcommit . Thus, Dgraph reads are linearizable. Also, all reads
query is higher than the MaxAssigned registered by that Al- are snapshots across the entire cluster, seeing all previously
pha, it would block the query until its MaxAssigned reaches committed transactions in full.
or exceeds the start ts. This solution nicely tackles a wide- As mentioned, Dgraph reads are linearizable. While this is
array of edge case scenarios, including Alpha falling back or great for correctness, it can cause performance issues when a
going behind a network partition from its peers or just restart- lot of reads and writes are going on simultaneously. All reads
ing after a crash, etc. In all those cases, the queries would are supposed to block until the Alpha has seen all the writes
be blocked until the Alpha has seen all updates up until the up until the read timestamp. In many cases, operators would
timestamp of the query, thus maintaining the guarantee of opt for performance over achieving linearizablity. Dgraph

8
provides two options for speeding up reads: much harder. So, keeping with the hard learnt lesson of pre-
1. A typical read-write transaction would allocate a new dictability principle, we changed it to make the leader calcu-
timestamp to the client. This would update MaxAssigned late the snapshot index and propose this result. This allowed
which would then flow via Zero leader to Alpha leaders and leader and followers to all take snapshot at the same index,
then get proposed. Until that happens, a read can’t proceed. exactly the same time (if they’re generally caught up). Further
Read-only transactions would still require a read timestamp more, this group level snapshot event is then communicated
from Zero, but Zero would opportunistically hand out the to Zero to allow it to trim the conflict map by removing all
same read timestamp to multiple callers, allowing Alpha to entries below the snapshot timestamp. Following this chain
amortize the cost of reaching MaxAssigned across multiple of events in logs has improved debuggability of the system
queries. dramatically.
2. Best-effort transactions are a variant of read-only trans- Dgraph only keeps metadata in Raft snapshots, the actual
actions, which would use an Alpha’s observed MaxAssigned data is stored separately. Dgraph does not make a copy of
timestamp as the read timestamp. Thus, the receiver Alpha that data during snapshot. When a follower falls behind and
does not have to block at all and can continue to process the needs a snapshot, it asks the leader for it and leader would
query. This is the equivalent of eventual consistency model stream the snapshot from its state (Badger, just like Dgraph,
typical in other databases. Ultimately, every Dgraph read is a supports MVCC and when doing a read at a certain times-
snapshot over the entire distributed database and none of the tamp, is operating upon a logical snapshot of the DB). In the
reads would violate the snapshot guarantee. 1 previous versions, follower would wipe out its current state
before accepting the updates from the leader. In the newer
versions, leader can choose to send only the delta state up-
7 Replication date to the follower, which can decrease the data transmitted
considerably.
Most updates to Dgraph are done via Raft. Let’s start with
Alphas which can push a lot of data through the system. All
8 High Availability and Scalability
mutations and transaction updates are proposed via Raft and
are made part of the Raft write-ahead logs. On a crash and Dgraph’s architecture revolves around Raft groups for update
restart, the Raft logs are replayed from the last snapshot to log serialization and replication. In the CAP throrem, this
bring the state machine back up to the correct latest state. On follows CP, i.e. in a network partition, Dgraph would choose
the flip side, the longer the logs, the longer it takes for Alpha consistency over availability. However, the concepts of CAP
to replay them on a restart, causing a start delay. So, the logs theorem should not be confused with high availability, which
must be trimmed by taking a snapshot which indicates that is determined by how many instances can be lost without the
the state up until that point has been persisted and does not service getting affected.
need to be replayed on a restart. In a three-node group, Dgraph can loose one instance per
As mentioned above, Alphas write mutations to the Raft group without causing any measurable impact on the function-
WAL, but keep them in memory in a transaction cache. When ality of the database. However, loosing two instances from
a transaction is committed, the mutations are written to the the same group would cause Dgraph to block, considering all
state at the commit timestamp. This means that on a restart, updates go through Raft. In a five-node group, the number of
all the pending transactions must be brought back to memory instances that can be lost without affecting functionality is
via the Raft WAL. This requires a calculation to pick the two. We do not recommend running more than five replicas
right Raft index to trim the logs at, which would keep all the per group.
pending transactions in their entirety in the logs. Given the central managerial role of Dgraph Zero, one
One of the lessons we learnt while fixing Jepsen issues might assume that Zero would be the single point of failure.
was that, to improve debuggability of a complex distributed However, that’s not the case. In the scenario where Zero
system, the system should run like clock work. In other words, follower dies, nothing changes really. If the Zero leader dies,
once an event in one system has happened, events in other one of the Zero followers would become the leader, renew its
systems should almost be predictable. This guiding principle timestamp and uid assignment lease, pick up the transaction
determined how we take snapshots. status logs (stored via Raft) and start accepting requests from
Raft paper allows leaders and followers to take snapshots Alphas. The only thing that could be lost during this transition
independently of each other. Dgraph used to do that but that are transactions which were trying to commit with the lost
brought unpredictability to the system and made debugging Zero. They might error out, but could be retried. Same goes
1 Note however that a typical Dgraph query could hit multiple Alphas in
for Alphas. All Alpha followers have the same information
as the Alpha leader and any of the members of the group can
various groups — some of these Alphas might not have reached the read
timestamp (initial Alpha’s MaxAssigned timestamp) yet. In those cases, the be lost without losing any state.
query could still block until those Alphas catch up. Dgraph can support as many groups as can be represented

9
by 32-bit integer (even that is an artificial limit). Each group use trigram indexing, geo-spatial queries uses S2-cell based
can have one, three, five (potentially more, but not recom- geo indexing and so on... As described in section above, in-
mended) replicas. The number of uids (graph nodes) that can dexing keys encode predicate and token, instead of a predicate
be present in the system are limited by 64-bit unsigned integer, and uid. So, the mechanism to fill up the matrix is the same as
same goes for transaction timestamps. All of these are very in any other task query. Only this time, we use list of tokens
generous limits and not a cause of concern for scalability. instead of a list of Uids as the query set.

9 Queries 9.3 Filters

A typical Dgraph query can hit many Alphas, depending upon The technique described above works for traversals. But, fil-
where the predicates lie. Each query is sub-divided into tasks, ters (intersections) are a big part of user queries. Each task
each task responsible for one predicate. contains a UidList as a query and a matrix as a result. Task
also stores a resulting uid list, which can store a uid set from
the resulting UidMatrix. Depending upon whether filters are
9.1 Traversals applied or not, this uid set can be the same as merge-sorted
Dgraph query tasks (henceforth referred to as tasks) are gen- UidMatrix or a subset of it.
erally built around the mechanism of converting uid list to Filters are a tree in their own right. Dgraph supports AND,
matrix during traversal. The query can have a list of uids to OR and NOT filters, which can be further combined to create
traverse, the execution engine would do lookups in Badger a complex filter tree. Filters typically consist of functions
concurrently to get the posting lists for each Uid (note that which can ask for more information and are represented as
predicate is always part of the task), converting each uid to tasks. These tasks execute in the same mechanism described
a list. Thus, a task query would return a list of Uid lists, aka above, but do one additional thing. The tasks also contain the
UidMatrix. If the predicate holds a value (example, predicate source list of Uids (the resulting set from the parent task to
name), the UidList returns a list of values, aka ValueMatrix. which the filter is being applied to). This list of uids is sent as
A predicate could allow only one uid/value, or allow mul- part of the filter task. The task uses these uids to perform any
tiple uids/value. This mechanism works correctly in either intersections at the destination server, returning only a subset
of those scenarios. If the posting list only has one uid/value, of the results, instead of retrieving all results for the task. This
the resulting list would only have one element. A matrix in can significantly cut down the result payload size while also
this case would have a list of lists, each list with zero or one allowing optimizations during filter task execution to speed
element. Note that there’s parity between the index of the Uid things up. Once the results are returned, the co-ordinator
in list and the index of the list in UidMatrix. So, Dgraph can server would stitch up the results using the AND, OR or NOT
accurately maintain the relationships. operators.
A ValueMatrix is typically the leaf in the task tree. Once
we have values, we just need to encode them in the results. 9.4 Intersections
However, a task with UidMatrix result would typically have
sub-tasks. Those sub-tasks would need a query UidList for The uid intersection itself uses three modes of integer inter-
processing. Dgraph would merge-sort the UidMatrix into a section, choosing between linear scan, block jump or binary
single, sorted list of Uids, which would be copied over to the search depending upon the ratio of the size of the results and
sub-tasks. Each sub-task could similarly run expand on the the size of the source UidList to provide the best performance.
same or other predicates. When the two lists are of the same size, Dgraph uses linear
scan over both the lists. When one list is much longer than
other, Dgraph would iterate over the shorter list and do bi-
9.2 Functions nary lookups over the longer. For some range in between,
Dgraph also supports functions. These functions provide an Dgraph would iterate over the shorter and do forward seeking
easy way to query Dgraph when the global uid space needs block jumps over the longer list. Dgraph’s block based integer
to be restricted to a small set (or even a single uid). Functions encoding mechanism makes all this quite efficient.
also provide advanced functionality like regular expressions, TODO: Talk about ACID.
full-text search, equality and inequality over sortable data
types, geo-spatial searches, etc. These functions are also en- 10 Future Work
coded into a task query, except this time they don’t start with a
UidList. The task query instead contains tokens, derived from We had removed data caching from Dgraph due to heavy read-
the tokenizers corresponding to the index these functions are write contention, and built a new, contention-free Go cache
using (as explained above). Most functions require some sort library to aid our reads. Work is underway in integrating that
of index to operate, for example, regular expression queries with Dgraph. Dgraph does not have any query or response

10
caching — such a cache would be difficult to maintain in 11 Acknowledgments
an MVCC environment where each read can have different
results, based on its timestamp. Dgraph wouldn’t have been possible without the tireless con-
Sorted integer encoding and intersection is a hotly re- tributions of its core dev team and extended community. This
searched topic and there is a lot of room for optimization work also wouldn’t have been possible without funding from
here in terms of performance. As mentioned earlier, work is our investors. A full list of contributors is present here:
underway in experimenting a switch to Roaring Bitmaps.
We also plan to work on a query optimizer, which can github.com/dgraph-io/dgraph/graphs/contributors
better determine the right sequence in which to execute query.
So far, the simple nature of GraphQL has let the operators Dgraph is an open source software, available on
manually optimize their queries — but surely Dgraph can do
a better job knowing the state of data. https://github.com/dgraph-io/dgraph
Future work here is to allow writes during the shard move,
which depending upon the size of the shard can take some More information about Dgraph is available on
time.
TODO: Add a conclusion. https://dgraph.io

Graph Database
No ratings yet
Graph Database
64 pages
Unit 2 Articulation Point
No ratings yet
Unit 2 Articulation Point
26 pages
Graph Data Structure
No ratings yet
Graph Data Structure
19 pages
Ch02 - Big Data Storage Concepts
No ratings yet
Ch02 - Big Data Storage Concepts
23 pages
Week 10 (Graphs and Trees)
No ratings yet
Week 10 (Graphs and Trees)
65 pages
Graph
No ratings yet
Graph
83 pages
Dgraph: An Open Source, Distributed, Transactional Graph Database
No ratings yet
Dgraph: An Open Source, Distributed, Transactional Graph Database
26 pages
DS 3
No ratings yet
DS 3
45 pages
2 1graph
No ratings yet
2 1graph
70 pages
Chapter 6 - DS
No ratings yet
Chapter 6 - DS
67 pages
Graph in Datastructure
No ratings yet
Graph in Datastructure
34 pages
Graph Data Structure
No ratings yet
Graph Data Structure
56 pages
Unit 5 - DS - AK2 - Graph
No ratings yet
Unit 5 - DS - AK2 - Graph
92 pages
DAA Lectures - (Unit - 2)
No ratings yet
DAA Lectures - (Unit - 2)
98 pages
The Graph Whitepaper
No ratings yet
The Graph Whitepaper
12 pages
Graph Theory and Its Applications - What Can Graphs Do For Your Software - by Héla Ben Khalfallah - Sep, 2024 - ITNEXT
No ratings yet
Graph Theory and Its Applications - What Can Graphs Do For Your Software - by Héla Ben Khalfallah - Sep, 2024 - ITNEXT
52 pages
Secgraph: Towards Sgx-Based Efficient and Confidentiality-Preserving Graph Search (Full Version)
No ratings yet
Secgraph: Towards Sgx-Based Efficient and Confidentiality-Preserving Graph Search (Full Version)
17 pages
Al Ghezi Wiese2018 - Chapter - AdaptiveWorkload BasedPartitio
No ratings yet
Al Ghezi Wiese2018 - Chapter - AdaptiveWorkload BasedPartitio
9 pages
Unit 5 2
No ratings yet
Unit 5 2
98 pages
10.0 Graph Data Structure
No ratings yet
10.0 Graph Data Structure
16 pages
Paper1 VENUS - Vertex-Centric Streamlined Graph Computation On A Single PC
No ratings yet
Paper1 VENUS - Vertex-Centric Streamlined Graph Computation On A Single PC
12 pages
Final Graph
No ratings yet
Final Graph
77 pages
Hecataeus: A Framework For Representing SQL Constructs As Graphs
No ratings yet
Hecataeus: A Framework For Representing SQL Constructs As Graphs
8 pages
Ds Unit 5 (Graphs)
No ratings yet
Ds Unit 5 (Graphs)
35 pages
Graph
No ratings yet
Graph
12 pages
An Introduction To Graph Data Management
No ratings yet
An Introduction To Graph Data Management
39 pages
Unit-5 Graph - Hashing - Student
No ratings yet
Unit-5 Graph - Hashing - Student
46 pages
Module 5
No ratings yet
Module 5
26 pages
Unit 5 Graph Notes
No ratings yet
Unit 5 Graph Notes
13 pages
Diplo Cloud
No ratings yet
Diplo Cloud
5 pages
Agraph Tutorial
No ratings yet
Agraph Tutorial
22 pages
Week 10 (Graphs and Trees)
No ratings yet
Week 10 (Graphs and Trees)
66 pages
Unit 5
No ratings yet
Unit 5
15 pages
Graphs in ds2 Bca 4
No ratings yet
Graphs in ds2 Bca 4
20 pages
Implement - Graph Databases
No ratings yet
Implement - Graph Databases
40 pages
Graph Database - Wikipedia
No ratings yet
Graph Database - Wikipedia
15 pages
Unit - 5 (DS)
No ratings yet
Unit - 5 (DS)
17 pages
NoSQL - Unit2
No ratings yet
NoSQL - Unit2
8 pages
Lab Manual 12 DSA
No ratings yet
Lab Manual 12 DSA
20 pages
Graph
No ratings yet
Graph
26 pages
UNIT V - Graphs
No ratings yet
UNIT V - Graphs
18 pages
Types of Graphs in Data Structures
No ratings yet
Types of Graphs in Data Structures
18 pages
Module 5 1
No ratings yet
Module 5 1
25 pages
Unit 7graphs - Notes
No ratings yet
Unit 7graphs - Notes
10 pages
DS Unit 5
No ratings yet
DS Unit 5
17 pages
8 - Graph Databases
No ratings yet
8 - Graph Databases
7 pages
Graph Database-An Overview of Its Applications and Its Types
No ratings yet
Graph Database-An Overview of Its Applications and Its Types
5 pages
Topic 1 - Graphs
No ratings yet
Topic 1 - Graphs
14 pages
Graph
No ratings yet
Graph
23 pages
A Brief Study of Graph Data Structure: Ijarcce
No ratings yet
A Brief Study of Graph Data Structure: Ijarcce
5 pages
08 Graph Databases Survey
No ratings yet
08 Graph Databases Survey
7 pages
Unit 5
No ratings yet
Unit 5
38 pages
Graph 1
No ratings yet
Graph 1
16 pages
Graph Databases: Their Power and Limitations
No ratings yet
Graph Databases: Their Power and Limitations
12 pages
A Comparison of Current Graph Database Models
No ratings yet
A Comparison of Current Graph Database Models
7 pages
DSA - Module 4 - Lesson 3 - Graph
No ratings yet
DSA - Module 4 - Lesson 3 - Graph
47 pages
System
No ratings yet
System
2 pages
Triad: A Distributed Shared-Nothing RDF Engine Based On Asynchronous Message Passing
No ratings yet
Triad: A Distributed Shared-Nothing RDF Engine Based On Asynchronous Message Passing
12 pages
No SQL Module 5
No ratings yet
No SQL Module 5
13 pages
Datasheet Ion 8650
100% (1)
Datasheet Ion 8650
10 pages
Imaster NCE V100R020C10 REST NBI User Guide 10
100% (1)
Imaster NCE V100R020C10 REST NBI User Guide 10
110 pages
Assignments - NOC - Principles of Digital Communications - 2018
No ratings yet
Assignments - NOC - Principles of Digital Communications - 2018
37 pages
Motor Control Using Arduino
71% (7)
Motor Control Using Arduino
41 pages
Solution SLD-STS-3000K - 330KTL-H2
No ratings yet
Solution SLD-STS-3000K - 330KTL-H2
1 page
Bus Bar Protection Scheme 2003
No ratings yet
Bus Bar Protection Scheme 2003
34 pages
18CS72 - BDA - Module - 4 - Question - Bank
100% (2)
18CS72 - BDA - Module - 4 - Question - Bank
2 pages
Cs Project
No ratings yet
Cs Project
27 pages
Digital McLogic Design
No ratings yet
Digital McLogic Design
851 pages
EM6400 RegisterMap
No ratings yet
EM6400 RegisterMap
6 pages
Aristotle - On Sense and The Sensible
No ratings yet
Aristotle - On Sense and The Sensible
54 pages
TP PSPICE Simulation of MOSFET
No ratings yet
TP PSPICE Simulation of MOSFET
11 pages
KORG MONOLOGUE Service Manual
No ratings yet
KORG MONOLOGUE Service Manual
16 pages
General Settings: Number Ranges Event-Receiver Linkage Idoc Administration Fast Entry Long Names - Short Names
No ratings yet
General Settings: Number Ranges Event-Receiver Linkage Idoc Administration Fast Entry Long Names - Short Names
12 pages
MS Sharepoint Content Deployment: ISSUES and Fixes
No ratings yet
MS Sharepoint Content Deployment: ISSUES and Fixes
14 pages
Notes Notes of of Language Language
No ratings yet
Notes Notes of of Language Language
133 pages
HD 2000 Hardware Guide
No ratings yet
HD 2000 Hardware Guide
20 pages
5th Sem Syllabus
No ratings yet
5th Sem Syllabus
13 pages
LBR-Stack ROS 2 and Python Integration of KUKA FRI For Med and IIWA Robots
No ratings yet
LBR-Stack ROS 2 and Python Integration of KUKA FRI For Med and IIWA Robots
4 pages
Microsoft .NET SDK 8.0.106 (x64) 20240710131539
No ratings yet
Microsoft .NET SDK 8.0.106 (x64) 20240710131539
17 pages
RAD (Rapid Application Development
No ratings yet
RAD (Rapid Application Development
3 pages
Fuse Monitoring - Auto Change Over System
No ratings yet
Fuse Monitoring - Auto Change Over System
2 pages
Graphic Ass 01
No ratings yet
Graphic Ass 01
2 pages
101s Driver Manual
No ratings yet
101s Driver Manual
129 pages
Release Notes IoT Gateway 1.2.1
No ratings yet
Release Notes IoT Gateway 1.2.1
13 pages
Native Log
No ratings yet
Native Log
4 pages
Phase Shift Circuit For CRT Displays: - Qutline
No ratings yet
Phase Shift Circuit For CRT Displays: - Qutline
3 pages
Tiristor Presspca
No ratings yet
Tiristor Presspca
19 pages
Monera - John Philip M - Bsee4a - Lecture 1 - Review Questions
No ratings yet
Monera - John Philip M - Bsee4a - Lecture 1 - Review Questions
1 page
NodOn TheSoftRemote ZWave UserGuide en
No ratings yet
NodOn TheSoftRemote ZWave UserGuide en
2 pages
Dgraph Essentials: The Complete Guide for Developers and Engineers
From Everand
Dgraph Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet

Dgraph: Synchronously Replicated, Transactional and Distributed Graph Database

Uploaded by

Dgraph: Synchronously Replicated, Transactional and Distributed Graph Database

Uploaded by

Dgraph: Synchronously Replicated, Transactional and Distributed Graph

Dgraph Labs, Inc.

<0xab> <type> "Astronaut" .

A triple is typically expressed as a subject-predicate-object

2.2 Data Storage

<0x01> <follower> <0xab> .

All subjects in Dgraph are assigned a globally unique id,

9 Queries 9.3 Filters

You might also like