Neo4j Lab
Neo4j Lab
This simple tutorial will guide you through the basics of using the graph database Neo4j.
You will be set a number of tasks. Should you have any problems, please ask for help!
INSTALLATION
There are a few options to running Neo4J as follows:
For the purposes of the lab the Neo4J Aura cloud version (option 1) is the easiest to set up, but
on campus you will need to use a wired connection (i.e. HWU lab PC) as EDUROAM WiFi blocks
the port used by AURA.
1
Click on New Instance and pick Empty Instance
After you click CREATE A DATABASE you will need to wait for the cloud DB to be setup –
it can take a few minutes.
Enter the password you saved earlier and Connect to your cloud Neo4J instance
(as noted above you won’t be able to do this from a laptop connected via Eduroam
due to ports being blocked; in the GRID lab use a desktop PC instead via wired connection)
2
Practical Lab (i.e. post installation)
BASICS
Our first graph is going to have UK cities as nodes, and modes of transport between them
as relationships.
Our first node is Edinburgh, notice we assign the node the role of CITY:
Hint: use the “Table” or “Text” buttons on the left to see your
results.
3
If you move the mouse pointer over the node, the bottom bar will show the node’s
properties:
ADDING RELATIONSHIP
Our first relationship will indicate that it is possible to drive from Edinburgh to London in 7
hours. This will use a combination of a match clause (to find the nodes) and a create
clause (to link the nodes):
[] defines the relationship. Again the properties are given as a JSON document. Notice the
create clause uses the labels (l and e) to refer to the nodes (London and Edinburgh) rather
than repeating the declaration from the match clause.
4
TASK: add the following relationships between London and Edinburgh:
TRAIN, 5 hours
FLY, 1 hour
TASK: Add the town of Grangemouth, give it a TOWN role. Link Grangemouth to Edinburgh
via a DRIVE relationship where the name is m9. Try to do all this in a single query!
When you look at the full graph, notice that Grangemouth is a different colour. This
represents the different role it has.
Edinburgh’s population is 492, 680. To add that to the Edinburgh node we must first match
the Edinburgh node and then set the population value:
5
You can compare the property values within a graph. For example, to determine if the
population of Edinburgh is larger than London:
match (e:CITY{name:"Edinburgh"}), (l:CITY {name:"London"}) return
e.population >= l.population
To delete a property from a node, match the node and then remove the property:
The delete failed because you must also delete the relationships for that node:
match (n:CITY {name:"Edinburgh"}) detach delete n
Here we match the node for Edinburgh and then look it see if it has relationships. If it has
relationships we delete the node and its relationships. If the Edinburgh node has no
relationships we just delete the node.
To delete all nodes and relationships from your graph run the query:
match (n) detach delete n
Note:
To delete a node by it's internal id use this syntax (e.g. for a CITY node):
MATCH (t:CITY) where id(t)=12 DETACH DELETE t
6
A BIGGER GRAPH
Paste it into the query box at the top of your Neo4j window (make sure there are no empty
lines at the end). Run the query. Your output should be similar to:
Note: You can drag the nodes around to rearrange the graph, which might make it easier to read.
It contains pseudo information regarding trains journeys in the UK. We have a number of
cities and relationships which indicate that you can travel between the cities on a train.
To list all the destinations you can reach with a single leg journey from Edinburgh:
TASK: List all the destinations you can reach with a multiple leg journey from Dundee
To list all the routes from Sheffield to Cardiff we create a variable p in which we store all the
paths between the two nodes:
7
match p = (n:CITY {name:"Sheffield"})-[:TRAIN*]->(d:CITY {name:"Cardiff"})
return p
To list the routes and their length (number of legs) append length(p) to the return clause:
Notice that the path includes the name of the TRAIN relation between the cities (e.g., east
coast).
TASK: what is the length of the shortest path between Dundee and Cardiff?
Hint: Your return clause should use the min() function
TASK: Create a FLY relationship between Edinburgh and Cardiff, with the relationship name
“ba”. What is the length of the longest and shortest paths now?
If we do not wish to fly, we should restrict our path query to the TRAIN relationship. We do
this by specifying the relationship: … [:TRAIN*] …
TASK: What is the full query for using the TRAIN restriction?
If we are willing to take no more than a 3-leg journey from Edinburgh, where can we get
to?
TASK: What query can be used to find the destinations when you start from Edinburgh and
take a journey with at least 2 legs but no more than 3?
8
A query plan tells you how a DBMS executes a particular query. You can use this to
determine how to optimise the query.
Look at the text below the top node (NodeByLabelScan), it tells us that Neo4j has to scan 9
different rows (nodes) at the beginning of the query execution.
The query plan shows us that the name field is heavily used, so we should create an index
on that field to help the database match names quickly:
Rerun the query plan in the above screenshot. Look at the text below the
NodeByLabelScan – how many rows (nodes) does Neo4j have to scan now?
The benefits of this index will apply to every query that filters the nodes by the name
property.
------END OF LAB--------