0% found this document useful (0 votes)

42 views27 pages

ArangoDB PDF Submission Handling Billions of Edges in A Graph Database

Unlock the power of ArangoDB, the most complete graph database. Explore its scalability for multiple use cases including fraud detection, supply chain, network analysis, traceability, recommendations, and more. Trusted by global enterprises. Explore the advantage today! URL: https://arangodb.com/ Location: San Francisco, CA 94104-5401 United States

Uploaded by

arangodb448

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views27 pages

ArangoDB PDF Submission Handling Billions of Edges in A Graph Database

Uploaded by

arangodb448

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Handling Billions Of Edges in

a Graph Database

+ +

Copyright © ArangoDB GmbH / ArangoDB Inc, 2018 1

What are Graph Databases
‣ Schema-free Objects (Vertices)
{
hobby {
name: "alice",
age: 32 name: "dancing" ‣ Relations between them (Edges)
} }
‣ Edges have a direction
ho
bb
y

{ ‣ Edges can be queried in both directions

name: "reading"
}
‣ Easily query a range of edges (2 to 5)
‣ Undefined number of edges (1 to *)
y
bb
ho

{ ‣ Shortest Path between two vertices

name: "bob", hobby {
age: 35, name: "fishing"
size: 1,73m }
}
Typical Graph Queries

‣ Give me all friends of Alice

Eve Bob Frank

Charly Alice Dave

Typical Graph Queries

‣ Give me all friends-of-friends of Alice

Eve Bob Frank

Charly Alice Dave

Typical Graph Queries

‣ What is the linking path between Alice and Eve

Eve Bob Frank

Charly Alice Dave

Typical Graph Queries

‣ Which Train Stations can I reach if I am allowed to travel a distance of at most 6

stations on my ticket

You are
here
Typical Graph Queries: Pattern Matching

‣ Give me all users that share two hobbies with Alice

Alice Friend

Hobby1 Hobby2
Typical Graph Queries: Pattern Matching

‣ Give me all products that at least one of my friends has bought together with the
products I already own, ordered by how many friends have bought it and the
products rating, but only 20 of them.

is_friend has_bought
Alice Friend Product
ha

ht
ug
s_

bo
bo

s_
ug

ha
ht

Product
Non-Typical Graph Queries

‣ Give me all users which have an age attribute between 21 and 35.
‣ Give me the age distribution of all users
‣ Group all users by their name
‣ MULTI-MODEL database
‣ Stores Key Value, Documents, and Graphs
‣ All in one core
‣ Query language AQL
‣ Document Queries
‣ Graph Queries
‣ Joins
‣ All can be combined in the same statement
‣ ACID support including Multi Collection Transactions
AQL

FOR user IN users

RETURN user
AQL

FOR user IN users

FILTER user.name == "alice"
RETURN user

Alice
AQL

FOR user IN users

FILTER user.name == "alice"
FOR product IN OUTBOUND user has_bought
RETURN product

has_bought
Alice TV
AQL

FOR user IN users

FILTER user.name == "alice"
FOR recommendation, action, path IN 3 ANY user has_bought
FILTER path.vertices[2].age <= user.age + 5
AND path.vertices[2].age >= user.age - 5
FILTER recommendation.price < 25
LIMIT 10
RETURN recommendation

alice.age - 5 <= bob.age &&

bob.age <= alice.age + 5 playstation.price < 25

has_bought has_bought has_bought

Alice TV Bob Playstation
Traversal - Iterate down two edges with some filters

‣ We first pick a start vertex (S)

‣ We collect all edges on S
S
‣ We apply filters on edges
‣ We iterate down one of the new vertices (A)
C
‣ We apply filters on edges
A
B ‣ The next vertex (E) is in desired depth.
Return the path S -> A -> E
‣ Go back to the next unfinished vertex (B)
D ‣ We iterate down on (B)
E
F ‣ We apply filters on edges
‣ The next vertex (F) is in desired depth.
Return the path S -> B -> F
Traversal - Complexity

‣ Once: 1
‣ Find the start vertex Depends on indexes: Hash:
‣ For every depth: 1
‣ Find all connected edges Edge-Index or Index-Free: n
‣ Filter non-matching edges Linear in edges: n*1
‣ Find connected vertices Depends on indexes: Hash: n
‣ Filter non-matching vertices Linear in vertices: 3n
Only one pass:

O(3n)
Traversal - Complexity

‣ Linear sounds evil?

‣ NOT linear in All Edges O(E)
‣ Only Linear in relevant Edges n < E
‣ Traversals solely scale with their result size
‣ They are not effected at all by total amount of data
‣ BUT: Every depth increases the exponent: O(3*nd)
‣ "7 degrees of separation": 3*n6 < E < 3*n7
Challenge 1: Supernodes

‣ Many graphs have "celebrities"

‣ Vertices with many inbound and/or outbound edges
‣ Traversing over them is expensive (linear in number of Edges)
‣ Often you only need a subset of edges

Bob Alice
First Boost - Vertex Centric Indices

‣ Remember Complexity? O(3 * nd)

‣ Filtering of non-matching edges is linear for every depth

‣ Index all edges based on their vertices and arbitrary other attributes
‣ Find initial set of edges in identical time
‣ Less / No post-filtering required
‣ This decreases the n significantly

Alice
Challenge 2: Big Data

‣ We have the rise of big data

‣ Store everything you can
‣ Dataset easily grows beyond one machine
‣ This includes graph data!
Scaling

‣ Distribute graph on several machines (sharding)

‣ How to query it now?

‣ No global view of the graph possible any more
‣ What about edges between servers?

‣ In a sharded environment network most of the time is the bottleneck

‣ Reduce network hops
‣ Vertex-Centric Indexes again help with super-nodes
‣ But: Only on a local machine
Dangers of Sharding

‣ Only parts of the graph on every machine

‣ Neighboring vertices may be on different machines
‣ Even edges could be on other machines than their vertices

‣ Queries need to be executed in a distributed way

‣ Result needs to be merged locally
Random Distribution

‣ Advantages: ‣ Neighbors on different machines

‣ every server takes an equal portion of ‣ Probably edges on other machines than
data their vertices
‣ easy to realize ‣ A lot of network overhead is required for
‣ no knowledge about data required querying
‣ always works
‣ Disadvantages:
Index-Free Adjacency

‣ Used by most other graph databases

‣ Every vertex maintains two lists of it's edges (IN and OUT)
‣ Do not use an index to find edges
‣ How to shard this?

????

‣ ArangoDB uses an hash-based EdgeIndex (O(1) - lookup)

‣ The vertex is independent of it's edges
‣ It can be stored on a different machine
Domain Based Distribution

‣ Many Graphs have a natural distribution

‣ By country/region for People
‣ By tags for Blogs
‣ By category for Products
‣ Most edges in same group
‣ Rare edges between groups

ArangoDB Enterprise Edition

uses Domain Knowledge
for short-cuts
SmartGraphs - How it works

Foxx Foxx

Coordinator Coordinator

DB Server 1 DB Server 2 DB Server n

Thank You
‣ Further questions?
‣ Follow us on twitter: @arangodb and @ArangoMatthew
‣ Join our slack: slack.arangodb.com
‣ https://www.arangodb.com/speakers/matthew-von-maszewski/
‣ https://github.com/arangodb/arangodb

Graph Database
No ratings yet
Graph Database
64 pages
Graph Databases: Phil Bartie
No ratings yet
Graph Databases: Phil Bartie
83 pages
ArangoDB GraphCourse Beginners
No ratings yet
ArangoDB GraphCourse Beginners
64 pages
Unit 5 2
No ratings yet
Unit 5 2
98 pages
Graph
No ratings yet
Graph
56 pages
Graph Data Structure
No ratings yet
Graph Data Structure
19 pages
5 & 6 - Graph Database
No ratings yet
5 & 6 - Graph Database
46 pages
Graph Algorithms: Timothy Vismor June 11, 2011
No ratings yet
Graph Algorithms: Timothy Vismor June 11, 2011
30 pages
Graph Databases: Immanuel Trummer
No ratings yet
Graph Databases: Immanuel Trummer
38 pages
Ds Unit 5 (Graphs)
No ratings yet
Ds Unit 5 (Graphs)
35 pages
Graph
No ratings yet
Graph
13 pages
Unit 3 Graph
No ratings yet
Unit 3 Graph
58 pages
DSA UNIT-5 Notes 2023
No ratings yet
DSA UNIT-5 Notes 2023
65 pages
Unit 8
No ratings yet
Unit 8
44 pages
ArangoDB GraphCourse Beginners
No ratings yet
ArangoDB GraphCourse Beginners
64 pages
Chapter 6 - DS
No ratings yet
Chapter 6 - DS
67 pages
Graph and Trees
No ratings yet
Graph and Trees
3 pages
Data Structures Unit-4 Notes
No ratings yet
Data Structures Unit-4 Notes
23 pages
Unit 5 - DS - AK2 - Graph
No ratings yet
Unit 5 - DS - AK2 - Graph
92 pages
DSC++ Unit-V
No ratings yet
DSC++ Unit-V
29 pages
Lecture 8 Graph Databases
No ratings yet
Lecture 8 Graph Databases
77 pages
UNIT - 4 Graphs PDF
No ratings yet
UNIT - 4 Graphs PDF
19 pages
Lecture 2.3.1 Graph
No ratings yet
Lecture 2.3.1 Graph
23 pages
Unit Iii Graphs
No ratings yet
Unit Iii Graphs
32 pages
Unit-5 21CSC201J
No ratings yet
Unit-5 21CSC201J
23 pages
Graph Data Structure Presentation
100% (1)
Graph Data Structure Presentation
23 pages
Data Structures
No ratings yet
Data Structures
42 pages
DS Unit 4
No ratings yet
DS Unit 4
82 pages
Lecture 16
No ratings yet
Lecture 16
52 pages
Graph Algorithms
No ratings yet
Graph Algorithms
44 pages
Data Structures Lab Exp 13 - 14 - 16 Graphs BFS - DFS - Prims - Kruskals
No ratings yet
Data Structures Lab Exp 13 - 14 - 16 Graphs BFS - DFS - Prims - Kruskals
50 pages
A2SV Graph Lecture
No ratings yet
A2SV Graph Lecture
83 pages
Data Structures & Algorithms PPT 4
No ratings yet
Data Structures & Algorithms PPT 4
36 pages
Graphs, Hashing, Sorting, Files: Definitions: Graph, Vertices, Edges
No ratings yet
Graphs, Hashing, Sorting, Files: Definitions: Graph, Vertices, Edges
24 pages
Dsa - Graph
No ratings yet
Dsa - Graph
65 pages
Data Structure: Tree and Graph
No ratings yet
Data Structure: Tree and Graph
22 pages
Graphs
No ratings yet
Graphs
8 pages
Data Structures Lab 12 Graphs BFS DFS - R
No ratings yet
Data Structures Lab 12 Graphs BFS DFS - R
50 pages
Lecture 11 - Graphs P1 PDF
No ratings yet
Lecture 11 - Graphs P1 PDF
66 pages
Graphs
No ratings yet
Graphs
122 pages
DFS Unit 4
No ratings yet
DFS Unit 4
5 pages
Graph - Representation
No ratings yet
Graph - Representation
40 pages
Graph
No ratings yet
Graph
15 pages
DSA - Module 4 - Lesson 3 - Graph
No ratings yet
DSA - Module 4 - Lesson 3 - Graph
47 pages
Graph
No ratings yet
Graph
128 pages
Unit 4 Graph
No ratings yet
Unit 4 Graph
16 pages
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea Graphs (I) Reading: Chap.9, Weiss
No ratings yet
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea Graphs (I) Reading: Chap.9, Weiss
34 pages
Graphs Lectures
No ratings yet
Graphs Lectures
44 pages
Examsexpert - In: Graphs, Hashing, Sorting, Files
No ratings yet
Examsexpert - In: Graphs, Hashing, Sorting, Files
24 pages
Graphs: Presented By, M.Sangeetha, Ap/Cse, Kongu Engineering College
No ratings yet
Graphs: Presented By, M.Sangeetha, Ap/Cse, Kongu Engineering College
61 pages
Lec1 Graph
No ratings yet
Lec1 Graph
42 pages
SQLGraph - When ClickHouse Marries Graph Processing Amoisbird PDF
0% (1)
SQLGraph - When ClickHouse Marries Graph Processing Amoisbird PDF
35 pages
Fundamental of Database Group Work
No ratings yet
Fundamental of Database Group Work
15 pages
Graphs
No ratings yet
Graphs
154 pages
MOD 4 and 5
No ratings yet
MOD 4 and 5
46 pages
Chapter 10 - Graphs
No ratings yet
Chapter 10 - Graphs
19 pages
DS IV Unit Notes
No ratings yet
DS IV Unit Notes
29 pages
Unit 4
No ratings yet
Unit 4
71 pages
Final Graph
No ratings yet
Final Graph
77 pages
Bell's Inequality Untwisted
From Everand
Bell's Inequality Untwisted
Jim Spinosa
No ratings yet
Dice Resume CV LAKSHMI GUDAPATI
No ratings yet
Dice Resume CV LAKSHMI GUDAPATI
5 pages
Module #3 Transaction Concurrency Control and Recovery System
No ratings yet
Module #3 Transaction Concurrency Control and Recovery System
82 pages
AVEVAEdge FDA21 CFRPart 11
No ratings yet
AVEVAEdge FDA21 CFRPart 11
21 pages
Unit 5
No ratings yet
Unit 5
26 pages
Laboratory Record Note Book: Rajalakshmi Institute of Technology
No ratings yet
Laboratory Record Note Book: Rajalakshmi Institute of Technology
102 pages
Ict Lesson Note Grade 9
No ratings yet
Ict Lesson Note Grade 9
6 pages
Topic 1 ISP565
No ratings yet
Topic 1 ISP565
58 pages
Short Notes
No ratings yet
Short Notes
44 pages
Assignment 2 - CSE3CWACSE5006
No ratings yet
Assignment 2 - CSE3CWACSE5006
10 pages
Cobol Coding Questions
No ratings yet
Cobol Coding Questions
18 pages
Discover Frequent Items in Small Stationary
No ratings yet
Discover Frequent Items in Small Stationary
16 pages
Cblecspu 04
No ratings yet
Cblecspu 04
10 pages
SQL Project 1
100% (1)
SQL Project 1
40 pages
Assignment 2
No ratings yet
Assignment 2
16 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
21 pages
Project File
No ratings yet
Project File
123 pages
SQL Project Fall 2024
No ratings yet
SQL Project Fall 2024
19 pages
Exam Questions
No ratings yet
Exam Questions
12 pages
Recursive CTE in SQL Server
No ratings yet
Recursive CTE in SQL Server
3 pages
DMS Question Bank Answers
No ratings yet
DMS Question Bank Answers
22 pages
Gruna Damp
No ratings yet
Gruna Damp
8 pages
MySQL Guide
No ratings yet
MySQL Guide
8 pages
Adv DBMS-Unit 3
No ratings yet
Adv DBMS-Unit 3
19 pages
التحويل من+ normalizatoin ER - MARK
No ratings yet
التحويل من+ normalizatoin ER - MARK
29 pages
Data Structures 2
No ratings yet
Data Structures 2
82 pages
Role of Big Data in Decision Making
100% (1)
Role of Big Data in Decision Making
9 pages
Pharma Script Pawan
No ratings yet
Pharma Script Pawan
19 pages
CMMS OptiMaint - Installation
No ratings yet
CMMS OptiMaint - Installation
19 pages
Bda Unit 1
No ratings yet
Bda Unit 1
24 pages
SAP HANA Troubleshooting and Performance Analysis Guide en
0% (1)
SAP HANA Troubleshooting and Performance Analysis Guide en
174 pages

ArangoDB PDF Submission Handling Billions of Edges in A Graph Database

Uploaded by

ArangoDB PDF Submission Handling Billions of Edges in A Graph Database

Uploaded by

Handling Billions Of Edges in

Copyright © ArangoDB GmbH / ArangoDB Inc, 2018 1

{ ‣ Edges can be queried in both directions

{ ‣ Shortest Path between two vertices

‣ Give me all friends of Alice

Eve Bob Frank

Charly Alice Dave

‣ Give me all friends-of-friends of Alice

Eve Bob Frank

Charly Alice Dave

‣ What is the linking path between Alice and Eve

Eve Bob Frank

Charly Alice Dave

‣ Which Train Stations can I reach if I am allowed to travel a distance of at most 6

‣ Give me all users that share two hobbies with Alice

FOR user IN users

FOR user IN users

FOR user IN users

FOR user IN users

alice.age - 5 <= bob.age &&

has_bought has_bought has_bought

‣ We first pick a start vertex (S)

‣ Linear sounds evil?

‣ Many graphs have "celebrities"

‣ Remember Complexity? O(3 * nd)

‣ We have the rise of big data

‣ Distribute graph on several machines (sharding)

‣ How to query it now?

‣ In a sharded environment network most of the time is the bottleneck

‣ Only parts of the graph on every machine

‣ Queries need to be executed in a distributed way

‣ Advantages: ‣ Neighbors on different machines

‣ Used by most other graph databases

‣ ArangoDB uses an hash-based EdgeIndex (O(1) - lookup)

‣ Many Graphs have a natural distribution

ArangoDB Enterprise Edition

DB Server 1 DB Server 2 DB Server n

You might also like