0% found this document useful (0 votes)
201 views

ArangoDB GraphCourse Beginners

This document provides an introduction to a graph course for beginners using ArangoDB. It discusses graph basics and concepts, as well as an example dataset involving US airports and flights that will be used for exercises. The reader will learn graph queries, traversals, and shortest path queries using AQL in ArangoDB.

Uploaded by

Mihajlo Andjelic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
201 views

ArangoDB GraphCourse Beginners

This document provides an introduction to a graph course for beginners using ArangoDB. It discusses graph basics and concepts, as well as an example dataset involving US airports and flights that will be used for exercises. The reader will learn graph queries, traversals, and shortest path queries using AQL in ArangoDB.

Uploaded by

Mihajlo Andjelic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Graph Course for Freshers:

The Shortest Path to first graph skills

+ +

2019 Edition
Welcome on board
This is a short journey for developers, data scientists and all other interested
folks. In this course you will learn how to get started with ArangoDB’s graph
related features and some other bits and pieces.

If you are new to ArangoDB, don't be afraid – we will start with the basic things.
Also don't mind the number of pages too much, there are plenty of illustrations
and exercises are optional!

We will use real world data of domestic flights and airports in the US.

The structure of the data should be easy to understand and enable you to
write many interesting queries to answer a variety of questions.

Hope you will enjoy the course!

Special thanks to @darkfrog for his feedback to the beta version and thousands of
enthusiastic downloaders of this course!
What you will learn
‣ Basics about graphs,

in general and in ArangoDB Visualization of the
‣ Architecture of ArangoDB
 example dataset using
dots for airports and
and what multi-model is arcs to represent flight
‣ How to import
 connections
(graph) data
‣ Doing queries in ArangoDB's
query language AQL
‣ Data retrieval with filtering,

sorting and more
‣ Simple graph queries
‣ Traversing through a graph

with different options
‣ Shortest path queries

3
Usability hint

The symbol below indicates a link.


If you read this course in a


browser, click on links with

the middle-mouse button

to open a new tab!

Graph Course page

The same goes for underlined links.


4
Table of Content
‣ Introduction (p.6) ‣ Graph Traversals (p.39)
‣ Graph Basics (p.7) ‣ Traversals explained (p.40)
‣ The Example Dataset (p.12) ‣ Graph Traversal Syntax (p.42)

‣ Concepts of ArangoDB (p.15) ‣ Traversal Options (p.44)


‣ What is Multi-Model? (p.16) ‣ Depth vs. Breadth First Search (p.45)
‣ ArangoDB Architecture (p.19) ‣ Uniqueness Options (p.48)
‣ Traversal Options – Hands on (p.52)
‣ Preparations for this Course (p.22)
‣ Download and Install ArangoDB (p.23) ‣ Advanced Graph Queries (p.54)
‣ Import the Dataset (p.24) ‣ Shortest Path (p.55)
‣ Pattern Matching (p.59)
‣ Starting with the dataset (p.32)
‣ ArangoDB Query Editor (p.33) ‣ Landing (p.60)
‣ First AQL Queries – Hands on (p.35) ‣ Survey and Support (p.61)
‣ Exercise Solutions (p.62)

5
Introduction

Graph Basics
&
The Example Dataset

6
Graph Basics
What is a graph? There are multiple definitions and types. A brief overview:

In discrete mathematics, a graph is defined as set of vertices and edges.


In computing it is considered an abstract data type which is really good to
represent connections or relations – unlike the tabular data structures of
relational database systems, which are ironically very limited in expressing
relations.

A good metaphor for graphs is to think of nodes as circles and edges as


lines or arcs. The terms node and vertex are used interchangeably here.
Usually vertices are connected by edges, making up a graph. Vertices don't
Vertex
have to be connected, but they may also be connected with more than
Edge
one other vertex via multiple edges. You may also find vertices connected
to themselves.

7
Graph Basics
Important types of graphs:

‣ Undirected – edges connect pairs of nodes without


having a notion of direction

‣ Directed – edges have a direction associated with them


(the lines/arcs have arrow heads in depictions)

‣ DAG – Directed Acyclic Graph: edges have a direction and


their are no loops. In the most simple case, this means
that if you have vertices A and B and an edge from A to
B, then there must not be another edge from B to A.

One example for a DAG is a tree topology.

8
Graph Basics
In ArangoDB, each edge has a single direction, it can't point
both ways at once. This model is also known as oriented graph.

Moreover, edges are always directed, but you can ignore the
direction (follow in ANY direction) when you walk through the OUTBOUND INBOUND

graph, or follow edges in reverse direction (INBOUND) instead


of going in the direction they actually point to (OUTBOUND). ANY
Walking through a graph is called traversal.

ArangoDB allows you to store all kinds of graphs in different


shapes and sizes, with and without cycles. You can save one
or more edges between two vertices or even the same vertex.
Also note that edges are full-fledged JSON documents,
which means you can store as much information on the edges
as you want!

9
Graph Basics
A few examples what can be answered by graph queries with the example dataset in mind:
‣ Give me all flights departing from JFK (airport in New York)
‣ Give me all flights landing in LAX (airport in Los Angeles) on January 5th
‣ Which airports can I reach with up to one stopover?

(From one or multiple starting airports)

‣ Shortest Path:
‣ What is the minimum amount of stopovers to fly from BIS

(Bismarck Municipal Airport in North Dakota) to LAX and where is the stopover?

‣ Pattern Matching:
‣ Departing from BIS, which flight to JFK with one stopover

(at least 20 minutes time for the transit) is the quickest and via which airport?

10
Graph Basics
Typical use cases for graph databases and "graphy" queries are:

‣ 360° View (Market Data, Customer, User, …) ‣ Master Data Management


‣ Artificial Intelligence ‣ Network Infrastructure
‣ Dependency Management ‣ Recommendation Engine
‣ Fraud Detection ‣ Risk Management
‣ Identity & Access Management ‣ Social Media Management
‣ Knowledge Graph

Whenever the depth of your search is unknown (how many edges to follow), then graph queries
are easier to write and more efficient to compute compared to other query patterns.

11
The Example Dataset
We took a dataset of US airports and flights, augmented and simplified it. Included are
more than 3,000 airports and roughly 300,000 flights from January 1st to 15th, 2008.
Data structure of airport documents:
Example airport as shown in the
Attribute Description
document editor of the web interface:
_key international airport abbreviation code
_id collection name + "/" + _key (computed property) You may switch
name full name of the airport view mode to
Code (JSON)
city name of the associated city
country name of the country it is in (USA)
lat latitude portion of the geographic location
long longitude portion of the geographic location
state name of the US state it is in
vip airport with premium lounge? (true or false) *

* We marked a few airports randomly for example queries shown later


12
The Example Dataset
Data structure of flights documents:
Example flight as shown in the
Attribute Description document editor of the web interface:
_from Origin (airport _id)
_to Destination (airport _id)
Year Year of flight (here: 2008)
Month Month of flight (1..12)
Day Day of flight (1..31)
DayOfWeek Weekday (1 = Monday .. 7 = Sunday)
DepTime Actual departure time (local, hhmm as number)
ArrTime Actual arrival time (local, hhmm as number)
DepTimeUTC Departure time (coord. universal time, ISO string)
ArrTimeUTC Arrival time (coordinated universal time, ISO string)
FlightNum Flight number
TailNum Plane tail number
UniqueCarrier Unique carrier code
Distance Travel distance in miles
13
The Example Dataset
Here are some example documents from both collections (JSON view mode):

airports flights

14
Concepts of ArangoDB

What is Multi-Model?
&
ArangoDB Architecture

15
What is Multi-Model?
‣ ArangoDB is a native multi-model
database
‣ Multi-Model: ArangoDB supports
three major NoSQL data models
‣ Native: Supports all data models
with one database core and one
query language (AQL)

‣ Unique features of AQL:


‣ Possibility to combine all 3 data
models in a single query
‣ combine joins, traversals, filters,
geo-spatial operations and
aggregations in your queries

16
What is Multi-Model?
How is multi-model possible at all?

ArangoDB is a
If you store a JSON document-oriented
Special _from and _to
document and treat it as data store using
attributes in edge
opaque value under a primary keys
documents pointing to
primary key then you other documents make up
have a key/value store. your graph in ArangoDB

17
What is Multi-Model?

Benefits of ArangoDB’s NATIVE MULTI-MODEL approach

Documents - JSON Graphs Key Values

{
"type": "pants", {
"waist": "32", "type": "television", K => V K => V K => V
"diagonal size": "46",
"length": "34",
"hdmi inputs": "3", K => V K => V K => V
"color": "blue",
"material": "cotton" "wall mountable": "true", K => V K => V K => V
} "built-in tuner": "true", K => V K => V K => V
"dynamic contrast": "50,000:1",
"Resolution": "1920x1080"
}

larger solution-space
no data-model lock-in simpler development
than relational model

18
ArangoDB Architecture
ArangoDB has a storage hierarchy like other
databases have too:
‣ You can create different Databases
which can hold an arbitrary number of
collections. There is a default database
called _system
‣ Collections can hold arbitrary amounts
of documents. There are two collection
types: document and edge collections
‣ Documents are stored in JSON format.
A document is a JSON object at the top-
level, whose attribute names are strings
and the values can be null, true, false,
numbers, strings, arrays and nested
objects. There are also system attributes
(_key, _id, _rev, for edges also _from, _to)

19
ArangoDB Architecture
How do airports & flights form a graph?
Airports are the vertices, flights are the edges. The _id attribute of airport documents is used for
the _from and _to attributes in the edge documents to link airports together by flights.

20
ArangoDB Architecture
Edge collections in summary:
‣ Place to hold relations
‣ Comparable with many-to-many relations in SQL systems (cross tables)
‣ Contain documents, but with special attributes
‣ _from: _id value of the source vertex
‣ _to: _id value of the target vertex
‣ Built-in edge index for every edge collection
‣ Building block of graphs

21
Preparations for this Course

Download and Install ArangoDB


&
Import the Dataset

22
Download and Install ArangoDB
‣ Go to arangodb.com/download/ to find the latest
Community or Enterprise Edition for your operating system.
Follow the instructions on how to download and install it
for your OS. We recommend to set a password for the
default user root. Further details can be found here:

docs.arangodb.com/latest/Manual/Installation/

‣ Once the server is booted up, open http://localhost:8529


in your browser to access Aardvark, the ArangoDB WebUI

‣ Login with your credentials, e.g. as root.



If you did not set a password, then leave the password
field empty.

‣ Next, select a database, e.g. the default _system database.

23
Import the Dataset – Airports
‣ Download the example dataset here:

arangodb.com/arangodb_graphcourse_demodata/
‣ Unpack it to a folder of your choice.

After unpacking you should see two .csv files named airports.csv and flights.csv
‣ Import the airports with ArangoDB's import tool arangoimport.

Run the following on your command line (single line):

arangoimport --file path to airports.csv on your machine


--collection airports --create-collection true --type csv
You can specify --server.username name to use another user than root.

If you did not set a password or if the server has authentication disabled then
just hit return when asked for a password.

If ArangoDB is in your PATH environment variable, then you can run the binaries
by their name from any working directory. Otherwise specify the full path.
24
Import the Dataset – Airports
You should see something like this in your console after putting in the import command:

25
Import the Dataset – Airports
What did arangoimport do?
‣ Created a new document collection (airports)

with a primary index on _key
‣ Created one document for each line of the CSV file

(except the first line and last, empty line)
‣ The first line is the header defining the attribute names

Note:
‣ Airport codes are provided as _key attribute in the CSV file
‣ The _key attribute is the primary key which uniquely
identifies documents within a collection. Therefore, we will
be able to retrieve airports via their airport code utilizing
the primary index

26
Import the Dataset – Airports
‣ Go to ArangoDB WebUI
(http://localhost:8529 in
your browser) and click on
COLLECTIONS in the menu
‣ Collection "airports"
should be there now
‣ The icon indicates that it is
a document collection
‣ Click on the collection to
browse its documents

27
Import the Dataset – Airports

28
Import the Dataset – Flights
The imported airports are the vertices of our graph. To complete our graph dataset,
we also need edges to connect the vertices. In our case the edges are flights.

‣ Import the flights into an edge collection with arangoimport.



Run the following in your command line (single line):

arangoimport --file path to flights.csv on your machine


--collection flights --create-collection true --type csv
--create-collection-type edge

Importing flights.csv might take a few moments to complete.



On a decent computer with at least 4 GB of memory and

an SSD drive it should take less than a minute.

29
Import the Dataset – Flights
What did arangoimport do?
‣ Created a new edge collection (flights)

with a primary index on attribute _key and an
edge index on _from and _to
‣ Created one edge document for each line of the
CSV file (except the header and the last line)
Note:
‣ The _from and _to attributes form the graph by
referencing document _ids of departure and
arrival airports
‣ No _key is provided, thus it gets auto-generated

30
Import the Dataset – Flights
‣ Go to ArangoDB WebUI
and click on COLLECTIONS
in the menu
‣ Edge Collection "flights"
should be there now
‣ The type of the collection
is indicated by a different
icon for edge collections
‣ Click on the flights
collection to browse its
edge documents

31
Starting with the dataset

AQL Query Editor


&
First AQL Queries

32
ArangoDB Query Editor
Now that we have demo
data in ArangoDB, let us
start to write AQL queries!

‣ Click on QUERIES in the


ArangoDB WebUI
‣ It brings up the AQL
query editor to write,
execute and profile
queries
‣ It supports syntax
highlighting and allows
you to save and manage
queries

33
ArangoDB Query Editor
Set limit for results shown

Run query

Write queries here

Switch result view mode

Query results

34
First AQL Queries – Hands on
‣ Fetch John F. Kennedy airport by _id using the ‣ You can SORT the results by one or multiple
DOCUMENT() function, which will look up the conditions in ascending (default) and
document utilizing the primary index: descending order (DESC), as well as offset and
RETURN DOCUMENT("airports/JFK") LIMIT the number of results. Note: The order of
such high-level operations influences the
‣ Use a FOR loop to iterate over the airports
output!
collection, filter by _key and return the Kennedy
FOR a IN airports
airport document. This pattern gets optimized
FILTER a.vip
automatically to utilize the primary index as well: SORT a.state, a.name DESC
FOR airport IN airports LIMIT 5
FILTER airport._key == "JFK" RETURN a
RETURN airport ‣ You don't have to RETURN full documents, you
‣ This construct can be used for complex filter can also return just parts of them (see the
criteria. Various operators are available. KEEP() and UNSET() functions for instance) or
FOR airport IN airports construct the query result as you desire:
FILTER airport.city == "New York" FOR a IN airports
AND airport.state == "NY" FILTER a._key IN ["JFK", "LAX"]
RETURN airport RETURN { fullName: a.name }
35
First AQL Queries – Hands on
‣ Count all documents in the collection: Exercises A: Document Queries
RETURN COUNT(airports) Here are some challenges if you want to practice
‣ Count how many V.I.P. airports there are.
 your AQL skills. Example solutions can be found at
Below we use COLLECT to group the the end of this course.
intermediate results without condition, which
means all filtered documents are grouped 1. Retrieve the airport document of

together. COLLECT has a syntax variation which Los Angeles International (LAX).
allows us to count the number of documents 2. Retrieve all airport documents of the city

efficiently. We return this number as result: Los Angeles.
FOR airport IN airports 3. Find all airports of the state North Dakota (ND)
FILTER airport.vip
COLLECT WITH COUNT INTO count and return the name attribute only.
RETURN count 4. Retrieve multiple airports via their primary key
Feel free to experiment further. You can do a lot (_key), for example BIS, DEN and JFK. Return an
more with AQL, but that is beyond the scope of object for each match: RETURN {airport: a.name}
this course. Find the full AQL documentation 5. Count the airports in the state New York (NY)
online and see the Training Center on our website! which are not vip.
36
First AQL Queries – Hands on
Now that you are familiar with the dataset and ‣ Return 10 flight numbers with the plane
AQL, try out the following graph queries before we landing in Bismarck Municipal airport (BIS):
go into the details of graph traversal. FOR airport, flight IN INBOUND
‣ Return the names of all airports one can reach 'airports/BIS' flights
directly (1 step) from Los Angeles International LIMIT 10
RETURN flight.FlightNum
(LAX) following the flights edges:
FOR airport IN 1..1 OUTBOUND ‣ Find all connections which depart from or
'airports/LAX' flights land at BIS on January 5th and 7th and
RETURN DISTINCT airport.name return the destination city and the arrival
‣ Return any 10 flight documents with the flight time in universal time (UTC):
departing at LAX and the destination airport FOR airport, flight IN ANY
documents like {"airport":{…},"flight":{…}} 'airports/BIS' flights
FOR airport, flight IN OUTBOUND FILTER flight.Month == 1
'airports/LAX' flights AND flight.Day >= 5
LIMIT 10 AND flight.Day <= 7
RETURN {airport, flight} RETURN { city: airport.city,
time: flight.ArrTimeUTC }

37
First AQL Queries – Hands on
‣ Edges can also be accessed without using graph Exercises B: Graph Queries
traversals – they are just documents: 1. Find all flights with FlightNum 860 (number) on
FOR flight IN flights January 5th and return the _from and _to
FILTER flight.TailNum == "N238JB" attributes only (you may use KEEP() for this).
RETURN flight
2. Find all flights departing or arriving at JFK with
If there are _from, _to and _id attributes in the FlightNum 859 or 860 and return objects with
response, the WebUI will try to display the result flight numbers and airport names where the
in Graph view mode: flights go to or come from respectively.
3. Combine a FOR loop and a traversal like:
FOR orig IN airports
FILTER orig._key IN ["JFK", "PBI"]
FOR dest IN OUTBOUND orig flights

to do multiple traversals with different starting
points. Filter by flight numbers 859 and 860.
Return orig.name, dest.name, FlightNum and
Day. Name the attributes appropriately.
38
Graph Traversals

Traversals explained
&
Graph Traversal Syntax

39
Traversals explained
Traversal means to walk along edges of a graph in
certain ways, optionally with some filters. Traversing
is very efficient in graph databases. In ArangoDB,
this is achieved by a hybrid index type which you Start (Depth 0) S
already heard of: the edge index.

How many steps to go in a traversal is known as


traversal depth: Depth 1
C
A B
‣ The starting vertex in a traversal (S) has a

traversal depth of zero.
‣ At depth = 1 are the direct neighbors of S

(A, B and C). F
‣ Their neighbor vertices in turn are at depth = 2

Depth 2 D E
(D, E and F).
Depth 3 G
40
Traversals explained
A traversal in OUTBOUND direction with a minimal and maximal depth of 2
might look like the following:
‣ We start the traversal at a vertex (S)
‣ The traverser walks down the first outgoing edge to A, but we are only at S
depth 1 (we defined a minimum and maximum of 2) – A is ignored.
‣ It continues down from A to D, the depth is 2 as required, so it returns D
‣ It also follows the other outgoing edge of A down to E. Depth is 2, but
some filter condition we put in place is not met, so the path is discarded C
A B
‣ There are no more edges to follow from A, therefore the traversal
continues with the second outgoing edge of S down to B. Depth is only 1,
so B is ignored, but the traverser will continue from here
‣ It follows the edge from B down to F, depth is 2, filter conditions are met, F
F is returned D E
‣ There is an edge from F to G, but the maximal depth is reached already
‣ The traversal ends with the last outgoing edge of S to C, which has no
G
edges to follow, depth is 1, hence C is ignored
41
Graph Traversal Syntax
Before we do more graph queries we should spend some time on the underlying concepts of the
query options. We will go through the keywords and basic options step-by-step:
Query Syntax Explanation
FOR vertex[, edge[, path]] FOR emits up to three variables
IN [min[..max]] ‣ vertex (object): the current vertex in a traversal
OUTBOUND|INBOUND|ANY startVertex ‣ edge (object, optional): the current edge in a traversal
edgeCollection[, more…] ‣ path (object, optional): representation of the current path
with two members:
By the way: Keywords like FOR, IN and ANY are ‣ vertices: an array of all vertices on this path
written all upper case in the code examples, ‣ edges: an array of all edges on this path
but it is merely a convention. You may also
write them all lower case or in mixed case. IN min..max: defines the minimal and maximal depth for the
Names of variables, attributes and collections traversal. If not specified min defaults to 1 and max defaults to min
are case-sensitive however!
startVertex Depth of traversal

Traversal in AQL documentation S 1 2 3 n


42
Graph Traversal Syntax
Before we do more graph queries we should spend some time on the underlying concepts of the
query options. We will go through the keywords and basic options step-by-step:
Query Syntax Explanation
FOR vertex[, edge[, path]] OUTBOUND/INBOUND/ANY defines the direction of your search
IN [min[..max]]
startVertex
OUTBOUND|INBOUND|ANY startVertex
edgeCollection[, more…] OUTBOUND S Traversal follows outgoing edges

startVertex
Vertex
INBOUND S Traversal follows incoming edges
Edge
startVertex
Traversal
ANY S Traversal follows edges pointing
in any direction

edgeCollection: one or more names of collections holding the


Traversal in AQL documentation edges that we want to consider in the traversal (anonymous graph)

43
Traversal Options

Depth vs. Breadth First Search


&
Uniqueness Options

44
Depth vs. Breadth First Search
Everybody who already took a closer look into the documentation
about traversals, saw that there are also OPTIONS to control the Depth-first search
traversal behavior.
S
For traversals with a minimum depth greater than or equal to 2,
6
you have two options how to traverse the graph: 1
4

‣ Depth-first (default): Continue down the edges from the start C


A B
vertex to the last vertex on that path or until the maximum
traversal depth is reached, then walk down the other paths. 2
3
5
‣ Breadth-first (optional): Follow all edges from the start vertex to
D E
the next level, then follow all edges of their neighbors by another
level and continue this pattern until there are no more edges to F
follow or the maximum traversal depth is reached.
45
Depth vs. Breadth First Search
Both algorithms return the same amount of paths if all other traversal
options are the same, but the order in which edges are followed and Breadth-first search
vertices are visited is different.
With a variable traversal depth of 1..2, the following paths would be found: S
3
Depth-first Breadth-first 1
2
S→A S→A
S→A→D S→B C
A B
S→A→E S→C
S→B S→A→D 4
S→B→F S→A→E 5
6
S→C S→B→F
D E
Note that there is no particular order in which edges of a single vertex are
followed. Hence, S→C may be returned before S→A and S→B. Shorter F
paths are returned before longer paths using breadth-first search still.
46
Depth vs. Breadth First Search
Breadth-first search can significantly improve performance if used together
with filters and limits by stopping before the maximal depth is reached.
Whether it is applicable depends on the use case. For example, you want to:
‣ Traverse a graph from vertex S with depth 1..10 S
‣ Find 1 vertex that fulfills your criteria,

lets assume vertex F meets your conditions
‣ Depth-first might follow the edge to A first, then all the way
 A C
down up to 10 hops to D, G, E, H and more B
‣ Breadth-first however finds F at depth 2 and never visits 

vertices past that level if you limit the query to a single match: D E
F
FOR v IN 1..10 OUTBOUND 'verts/S' edges
OPTIONS {bfs: true}
FILTER v._key == 'F' G H I
LIMIT 1
RETURN v
… …
47
Uniqueness Options
Not every graph has just a single path from a chosen start Graph with cycle S→B→C→S

vertex to its connected vertices. There may even be cycles in and multiple paths from S to E
a graph.
‣ By default, the traversal along any of the paths is stopped S
if an edge is encountered again, that has already been
visited. It keeps your traversals from running around in C
circles until the maximum traversal depth is reached. It is a
safe guard to not produce a plethora of unwanted paths. A
B
‣ Duplicate vertices on a path are allowed unless the
traversal is configured otherwise.

48
Uniqueness Options
The following query specifies the uniqueness options explicitly, Graph with cycle S→B→C→S

although the ones shown are used by default anyway: and multiple paths from S to E
FOR v, e, p IN 1..5 OUTBOUND 'verts/S' edges
OPTIONS { S
uniqueVertices: 'none',
uniqueEdges: 'path'
}
RETURN CONCAT_SEPARATOR('->', p.vertices[*]._key)
C

We use the path variable p, which is emitted by the traversal, and A


concatenate all vertex keys of the paths neatly as single string B
per path, like "S->A->D->E". The array expansion operator [*]
is used for convenience.

D
Array expansion in AQL documentation
E

49
Uniqueness Options
The query finds a total of 10 paths. One of them is S→B→C→S.
 Graph with cycle S→B→C→S

The start vertex is also the last vertex on that path, which is and multiple paths from S to E
possible because uniqueness of vertices is not ensured.
A path such as S→B→C→S→B→C is not present in the result, S
because uniqueness of edges for paths avoids following the
same edge twice. C

‣ uniqueEdges: 'none' would make the traverser follow the


A
edge from S to B to C to S, and from S to B to C again. It would B
only stop there, because the maximum depth of 5 is reached
at that point. If the maximum depth of the query was higher,
then the traversal would run very long, producing a high
amount of paths because of the loop. D

50
Uniqueness Options
To stop the start vertex (or other vertices) from being visited Graph with cycle S→B→C→S

more than once, we can enable uniqueness for vertices in two and multiple paths from S to E
ways:
S
‣ uniqueVertices: 'path' ensures no duplicate vertices on
each individual path.
C
‣ uniqueVertices: 'global' ensures every reachable vertex
to be visited once for the entire traversal. A
B
It requires bfs: true (breadth-first search). It is not
supported for depth-first search, because the results would be
completely non-deterministic (varying between query runs), as
there is no rule in which order the traverser follows the edges D
of a vertex. The uniqueness rule would lead to randomly
E
excluded paths whenever there are multiple paths to chose
from, of which it would take one.
51
Uniqueness Options
FOR v IN 0..5 OUTBOUND 'verts/S' edges Graph with cycle S→B→C→S

OPTIONS {
and multiple paths from S to E
bfs: true,
uniqueVertices: 'global'
}
RETURN v._key S

The query gives us all vertex keys of this example graph exactly
C
once. Path or or uniqueness of vertices would give us a lot of
duplicates instead, 14 in total.
A
Which edges are actually followed in this traversal is not B
deterministic, but since it is breadth-first search, every reachable
vertex is guaranteed to be visited one way or another.

Note: A depth of zero makes the traversal include the start D


vertex, which would otherwise only be accessible via the emitted
E
path variable like p.vertices[0].

52
Traversal Options – Hands on
For our domestic flights example we might want to have all airports directly reachable from a given
airport. Let’s see which airports we can reach from Los Angeles

‣ Return all airports directly reachable from LAX:


FOR airport IN OUTBOUND 'airports/LAX' flights
OPTIONS { bfs: true, uniqueVertices: 'global' }
RETURN airport

‣ Compare the execution times to this earlier shown query, which returns the same airports:
FOR airport IN OUTBOUND 'airports/LAX' flights
RETURN DISTINCT airport

You will see a significant performance improvement.


What happens is that RETURN DISTINCT de-duplicates airports only after the traversal has
returned all vertices (huge intermediate result), whereas uniqueVertices: 'global' is a
traversal option that instructs the traverser to ignore duplicates right away.

53
Advanced Graph Queries

Shortest Path
&
Pattern Matching

54
Shortest Path – Hands on
A shortest path query finds a connection between two given vertices
with the fewest amount of edges. With our domestic flights dataset we
could search for a connection between two airports with the fewest
stops for example.

‣ Find a shortest path between Bismarck Municipal airport and John F.


Kennedy airport and return the airport names on the route:
FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v.name

We defined BIS as our start vertex and JFK as our target vertex.

Shortest_Path in AQL documentation

55
Shortest Path

Source: Google Maps

We found a route via Denver International airport:

56
Shortest Path

Source: Google Maps

The result of the previous shortest path query shows that you have to change in
Denver (DEN) for example to get to JFK. There is apparently no direct flight.

Note: A Shortest_Path query can return different results. It just finds and returns
one of possibly multiple shortest paths. In this case it found: BIS→DEN→JFK

57
Shortest Path – Hands on
Sometimes you just want the length of the shortest path. To achieve this you can use LET.

‣ Return the minimum number of flights from BIS to JFK


LET airports = (
FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v
)
RETURN LENGTH(airports) - 1

Your result should be 2.

Note:
‣ We placed a -1 at the end of the query to not count the end vertex as a step!
‣ Using the shortest path algorithm one can not apply filters.

We need to resort to pattern matching instead to do so.

58
Pattern Matching
We adventured pretty deep into the graph jungle We could also return the edges and would end up
already. Exploring pattern matching in detail is with four edges in total. However, for the paths
beyond the scope of this course, but let us take a S→A→C and S→B→C we may want to to choose
quick look at it nonetheless. one over the other based on certain criteria. Full
We can easily add filter conditions for the end paths can be optionally emitted as third variable:
vertex and/or the edge which leads to it. Both are FOR vertex, edge, path IN ...
emitted by the traversal as we know: The path variable can then be used to apply filter
FOR endVertex, edgeToVertex IN ... conditions on intermediate or all vertices and/or
With a variable traversal depth of 1..2 and the edges on the path. This allows for queries like:
default traversal options, there are 4 paths in the What are the best connections between the airports A
following graph: A and B determined by the lowest total travel time?

Path S C It can be used to apply complex filter conditions in


traversals taking the entire path into account. In
B other words, it lets you discover specific patterns
If we return the emitted end vertex, then the – combinations of vertices and edges in graphs –
result will contain the vertices A, B, C and C again. and is therefore called pattern matching.
59
Landing

Survey and Support


&
Exercise Solutions

60
Survey and Support

What would you like to learn next?


Tell us with 3 clicks: Survey

Support ArangoDB :)



Feedback to
 Star us on
the course GitHub

Feeling stuck? Not for long.


Join the ArangoDB community to get help,
 Slack
challenge ideas or discuss new features! StackOverflow
Community

61
Exercises A – Solutions
There are often multiple ways in AQL to retrieve 4. Retrieve multiple airports via their primary key
the same result. If your solution is different to (_key), for example BIS, DEN and JFK. Return an
below queries but produces the correct result object for each match: RETURN {airport: a.name}
then you did very well :) FOR a IN airports
1. Retrieve the airport document of
 FILTER a._key IN ["BIS","DEN","JFK"]
RETURN { airport: a.name }
Los Angeles International (LAX).
RETURN DOCUMENT("airports/LAX") 5. Count the airports in the state New York (NY)
which are not vip.
2. Retrieve all airport documents of the city

FOR airport IN airports
Los Angeles. FILTER airport.state == "NY"
FOR a IN airports AND NOT airport.vip
FILTER a.city == "Los Angeles" COLLECT WITH COUNT INTO count
RETURN a RETURN count
3. Find all airports of the state North Dakota (ND)
and return the name attribute only.
FOR airport IN airports
FILTER airport.state == "ND"
RETURN airport.name
62
Exercises B – Solutions
1. Find all flights with FlightNum 860 (number) on 3. Combine a FOR loop and a traversal like:
January 5th and return the _from and _to FOR orig IN airports
attributes only (you may use KEEP() for this). FILTER orig._key IN ["JFK", "PBI"]
FOR dest IN OUTBOUND orig flights
FOR f IN flights
FILTER f.FlightNum == 860 …
AND f.Month == 1 to do multiple traversals with different starting
AND f.Day == 5 points. Filter by flight numbers 859 and 860.
RETURN KEEP(f, "_from", "_to") Return orig.name, dest.name, FlightNum and
2. Find all flights departing or arriving at JFK with Day. Name the attributes appropriately.
FlightNum 859 or 860 and return objects with FOR orig IN airports
flight numbers and airport names where the FILTER orig._key IN ["JFK", "PBI"]
flights go to or come from respectively. FOR dest, flight IN
OUTBOUND orig flights
FOR a,f IN ANY FILTER dest.FlightNum IN [859,860]
"airports/JFK" flights RETURN { from: orig.name,
FILTER f.FlightNum IN [859,860] to: dest.name,
RETURN { airport: a.name, number: f.FlightNum,
flight: f.FlightNum } day: f.Day }

63
We hope you enjoyed the course and it helped you to get started!

Simran Jan
Documentation Manager Head of Communications

AQL and data modeling Makes complex things


enthusiast with a passion easier to digest. Big fan
for technical writing of community support

You might also like