0% found this document useful (0 votes)
66 views

Graph Database Modeling With Neo4j

Uploaded by

maitphang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Graph Database Modeling With Neo4j

Uploaded by

maitphang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 158

I

CopyrightedMaterial
GraphDatabaseModelingwithneo4j
Copyright©2020-21byAjitSingh,AllRightsReserved.

Nopartofthispublicationmaybereproduced,storedinaretrievalsystem or
transmitted,inanyform orbyanymeans—electronic,mechanical,photocopying,
recordingorotherwise—
withoutpriorwrittenpermissionfromtheauthor,exceptfor
theinclusionofbriefquotationsinareview.

Forinformationaboutthistitleortoorderotherbooksand/orelectronicmedia,
contactthepublisher:

AjitSingh&AnantKumar
e:[email protected]
e:[email protected]
w:https://www.ajitvoice.wordpress.com
Preface

Thisbookisdesignedtowalkyouthroughthegraphdatamodeling.
Youwillbeintroducedtothe
basicprocessofdesigningagraphdatamodelthatcananswerawiderangeofbusiness
questionsacrossavarietyofdomains.

Graphdatamodelingistheprocessinwhichauserdescribesanarbitrarydomainasa
connectedgraphofnodesandrelationshipswithpropertiesandlabels.Agraphdatam
odelis designedtoanswerquestionsintheform
ofCypherqueriesandsolvebusinessandtechnical
problemsbyorganizingadatastructureforthegraphdatabase.

Thisbookissimplytheintroductiontodatamodelingusingasimple,straightforward
scenario.
Thereareplentyofopportunitiesthroughouttheupcomingguidestopracticemodelin
gdomains andanalyzingchangestothemodelthatmightneedtobemade.

Dataislikewater.It’sprobablyuselessifyoudon’tputitinahelpfulcontainer.Thesha
pe,size
andfunctionalityofthatcontainerdependsonyourintendeduse,butingeneral,aconta
ineris necessary.

Thesamegoeswithdata.Whenitcomestocreatinganewapplicationordatasolution,
you
needtoprovideastructureforthatdata.Thatstructuringprocessisknownasdatamod
eling.

Oftenreservedsolelyforseniordatabaseadministrators(DBAs)orprincipaldevel
opers,data
modelingissometimespresentedasanesotericartunknownabletomeremortals.Yo
umay worshiptheexpertdatamodelerfromafar.

Whilesomedatamodelingscenariosreallyarebestleftuptotheexperts,itdoesn’thav
etobe
difficultbydefault.Infact,datamodelingisasmuchabusinessconcernasatechnologi
calone. Soifyoudon’tknowasinglelineofcode,you’reinluck.

Anyonecandobasicdatamodeling,andwiththeadventofgraphdatabasetechnology,
matchingyourdatatoacoherentmodeliseasierthanever.Datamodelingisanabstract
ion
process.Youstartwithyourbusinessanduserneeds(i.e.,whatyouwantyourapplicat
iontodo).
Then,inthemodelingprocessyoumapthoseneedsintoastructureforstoringandorga
nizing yourdata.

Everydatamodelisunique,dependingontheusecaseandthetypesofquestionsthatus
ers needtoanswerwiththedata.Becauseofthis,thereisno“one-size-fits-
all”approachtodata
modeling.Usingbestpracticesandcarefulmodelingwillprovidethemostvaluabler
esultin producinganaccuratedatamodelthatbenefitsyourprocessesandusecase.

Thegraphdatabasesarenecessaryforaveryconcretedatasets:hugeamounts
ofdataofhigh
complexity,whereentitiesareveryrelatedtooneanother.Thatisbecause,they
efficientlyquerythroughtherelationshipsamongentities,incontrasttorelational
databases.
Graphdatabasessupportalgorithmstoperform concretequeriesthatareoutof
reachtorelationaldatabases,fortheirtabularstructureandstaticschema.Also,the
biggerthevolumeofdata,theslowerthequerieswouldbeinSQL,becausethey
would requireto lookup joined tableswith agreatnumberoftuples.Graph
databasesallow totraversethroughthegraphandreachahighlevelofdepth,
withouthavingtoreadallthedatastored.
Neo4jisanopensourceNoSQLgraphdatabase.Itisafullytransactionaldatabase
(ACID)thatstoresdatastructuredasgraphsconsistingofnodes,connectedby
relationships.Inspiredbythestructureoftherealworld,itallowsforhighquery
performanceoncomplexdata,whileremainingintuitiveandsimpleforthedeveloper
.

Neo4jis,byfar,theleadingtechnologyofgraphdatabases.Itanalyzeandtraverseof
alldatainrealtimeandgivestheresultsveryfast.Ithasgreatuserinterfaceand
support.Butthegreatestfeatureofitis;evendatasizegrow exponentially,
performanceofNeo4jdoesnotaffectedbyit.

Usingthisbook,you'llgetto learnthetheoryofgraphdatabaseandhowtouse
Neo4jtobuilduprecommendations,relationships,andcalculatetheshortestroute
betweentwolocations.Withexampledatamodels,bestpractices,use-cases,andan
applicationputtingeverythingtogether,thisbookwillgiveyoueverythingyouneedto
reallygetstartedwithNeo4j.Startingwithabriefintroductiontographtheory,this
bookwillshow youtheadvantagesofusinggraphdatabasesalongwithdata
modelingtechniquesforgraphdatabases.You'llgainpracticalhands-onexperience
withcommonlyusedandlesserknownfeaturesforupdatinggraphstorewith
Neo4j'sCypherquerylanguage.Thisbookincludesalotofbackgroundinformation,
helpsyougraspthefundamentalconceptsbehindthisradicalnewwayofdealing with
connected data,and willgiveyou lotsofexamplesofuse casesand
environmentswhereagraphdatabasewouldbeagreatinterest.

Neo4jisbeingusedbysocialmediaandecommerceindustrygiants.Youcantake
advantageofNeo4j'spowerfulfeaturesandbenefits-addBeginningNeo4jtoyour
librarytoday.

Contents

1.GraphDataModel
Graphdatabases
2.GraphSchemas
Selectingvertexlabels
Examplesoflabelselection
Drawingagraphschema
Summary
3.ConvertingERmodelstographschemas
ERmodelsanddiagrams
Example
ProceduretoconvertanERmodeltoagraphschema
Rule#1:Entitytypesbecomevertextypes
Rule#2:Binaryrelationshiptypesbecomeedgetypes
Rule#3:Naryrelationshiptypesbecomevertextypes
Conversionexample
Verticesarevertices,andedgesareedges
Summary
4.NormalizingGraphSchemas
Normalizationofrelationaldatabases
Transformationrulesthatproduceequivalentschemas
RuleA:Renamingpropertiesandlabels
RuleB:Reversingedgedirections
RuleC:Propertydisplacement RuleD:Specializationandgeneralization
RuleE:Edgepromotion
RuleF:Propertypromotion
RuleG:Propertyexpansion
Summary
5.Onemetarulefornormalization
Schemasandconstraints
Graphuniverses,transformationsandequivalence
Derivedtypes
Metarule:Addingandremovingderivedtypes
Provingthemetarule
Provingthe7rules:Renaming,Reversing,PropertyDisplacement,
Beyondtransformationrules
Summary
6.Validatinggraphschemas
7.Pixy:Firstorderlogicongraphdatabases
Background
OnSQL
Onfirstorderlogic
OnGremlin
Pixy:FirstorderlogicwithGremlin
ERmodelsinPixy
Queryrequirementsdon'tusuallymatterwhilemodeling
7.IntroductiontoDatabase
StateoftheartofDatabases
TypesofDBMS
NoSQLDBMS ComparisonofDBMS
Currenttrends
9.GraphDatabases
GraphTheoryandItsApplications
ConceptsofGraphDatabases
Queryperformance
10.Neo4j
IntroductionofNeo4j
AdvantagesofNeo4j
PropertiesofNeo4j
PerformanceInNeo4j
HowToIncreasePerformanceOfNeo4j?
CypherQueryLanguage
Structure
OperationsInCypher
LoadingDataWithCypher
UseCasesofNeo4j
11.Gettingstartedwithneo4j
InstallationorSetup
Installation&StartingaNeo4jserver
StartNeo4jfromconsole(headless,withoutwebserver)
StartNeo4jwebserver
StartNeo4jwebserver
Deleteoneofthedatabases
CypherQueryLanguage
RDBMSVsGraphDatabase
Cypher-Implementation
Creation Createanode
Createarelationship
QueryTemplates
CreateanEdge
Deletion
Deleteallnodes
Deleteallnodesofaspecificlabel
Match(capturegroup)andlinkmatchednodes
UpdateaNode
DeleteAllOrphanNodes
Python&Noe4j
12.Neo4jApplication
UseCaseSelected
Data
ImplementingData
Exportdata
QueryExamples(Neo4j-SQL)
ShortestPath
Betweennesscentrality:
Closenesscentrality:
PageRank:
CommunityDetection:
PossiblequeriesonSQL Bibliography

PartI
Chapter1

GraphDataModel

Arelationaldatabasehasaledger-stylestructure.ItcanbequeriedthroughSQL,and
itiswhatmostpeoplearefamiliarwith.Eachentryiscomposedofarowinatable.
Tablesarerelatedbyforeign-keyconstraints,whichishowyoucanconnectone
table’sinformationtoanother,liketheprimarykeys.Slowmulti-leveljoinsareoften
involvedwhenqueryingrelationaldatabases.

Foragraph,specificallyascatterplot,thinkoftheelementsasnodesor,dots.The
elementsforalinegrapharesimilarlyrepresentedbyvertices.Eachnodehaskeyvalu
e pairs and a label.Nodes are connected by relationships oredges.
Relationshipshaveatypeandadirection,andtheycanhaveproperties.Agraph
databaseissimplycomposedofdotesandlines.Thistypeofdatabaseissimpler
andmorepowerfulwhenthemeaningisintherelationshipsbetweenthedata.
Relationaldatabasescaneasilyhandledirectrelationships,butindirectrelationship
s aremoredifficulttodealwithinrelationaldatabases.

Figure1a

Whenbuildingarelationaldatabase,itisbuiltwithquestionsinmind.Whatkindsof
questionswillwebewantingtoanswer?Forexample,youwanttoknowhowmany
peoplewhoboughtatoaster,liveinKansas,haveacriminalrecord,anduseda
coupontobuythattoaster.Ifthedatabaseadministrator,orthepersonwhocreated
thedatabasedidnotanticipateaquestionlikethis,itmaybeverydifficulttoretrieve
thatinformationfrom arelationaldatabase.Forgraphdatabases,itispossibleto
answerunanticipatedquestions.Withagraph,youcanansweranyquestionaslong
asthatdataexistsandthereisapathbetweenthem.Agraphisdesignedto
traverseindirectrelationships.Withgraphdatabasesyoucanevenaddmore
relationshipsandstillmaintainperformance.Agraphdatabasetranscendsstoring
datapoints,rather,itstoresdatarelationships.Graphdatabasesstorerelationship
information.
Therearetwopropertiesofgraphdatabasesweshouldconsiderwheninvestigatingg
raph databasetechnologies:

Theunderlyingstorage
Somegraphdatabasesusenativegraphstoragethatisoptimizedanddesignedforstori
ng
andmanaginggraphs.Notallgraphdatabasetechnologiesusenativegraphstorage,
however.Someserializethegraphdataintoarelationaldatabase,anobject-oriented
database,orsomeothergeneral-purposedatastore.

Theprocessingengine
Somedefinitionsrequirethatagraphdatabaseuseindex-
freeadjacency,meaningthat
connectednodesphysically“point”toeachotherinthedatabase.Herewetakeaslight
ly broaderview:anydatabasethatfrom
theuser’sperspectivebehaveslikeagraphdatabase
(i.e.,exposesagraphdatamodelthroughCRUDoperations)qualifiesasagraphdata
base. We do acknowledge,however,the significantperformance advantages
ofindex-free
adjacency,andthereforeusethetermnativegraphprocessingtodescribegraphdatab
ases thatleverageindex-freeadjacency.

From adatabasepointofview,theconceptualtoolsdefiningaDB-Modelshould
addressatleastthestructuringanddescriptionofthedata,itsmaintainabilityand
theform toretrieveorquerythedata.Accordingtothesecriteria,aDB-Modelis
definedasacombinationofthreecomponents,firstacollectionofdatastructure
types,secondacollectionofoperatorsorinferencerulesandthirdacollectionof
generalintegrityrules.NotethatseveralproposalsofDB-Modelsdefineonlythe
datastructures,omittingsometimesoperatorsand/orintegrityrules.

Duetotheimportanceofmodelingconceptually,philosophicallyandinpractice,DB
Modelshavebecomeessentialabstractiontools.AmongthepurposesofaDB-
Model
are:Toolforspecifyingthekindsofdatapermissible;generaldesignmethodologyfor
databases;copingwithevolutionofdatabases;developmentoffamiliesofhighlevel
languagesforqueryanddatamanipulation;focusinDBMSarchitecture;vehiclefor
researchintothebehavioralpropertiesofalternativeorganizationsofdata.
Sincetheemergenceofdatabasemanagementsystems,therehasbeenanongoing
debateaboutwhattheDB-Modelforsuchasystem shouldbe.Theevolutionand
diversityofexistentDB-Modelsshowthatthereisnosilverbulletfordatamodeling.
Theparametersinfluencingtheirdevelopmentaremanifold,andamongthemost
importantwecanmentionthecharacteristicsorstructureofthedomaintobe
modeled,thetypeofintellectualtoolsthatappealstheuser,andofcourse,the
hardwareandsoftwareconstraintsimposed.Additionally,eachDB-
Modelproposal
isgroundedoncertaintheoreticaltools,andservesasbaseforthedevelopmentof
relatedmodels.
Figure1b:Evolutionofdatabasemodels.Rectanglesdenotemodels,arrowsindicateinfluences,and
circlesdenotetheoreticaldevelopments.Onthelefthandsideatimelineinyears.
DatabaseModelsEvolution–BriefHistoricalOverview
InthebeginningsofthedesignofDB-Models,physical(hardware)constraintswere
oneofthefundamentalparameterstobeconsidered.Beforetheadventofthe
relationalmodel,mostDB-Modelfocusedessentiallyinthespecificationofthe
structureofdatainactualfilesystems.Kerschbergetal.˜cite50130developeda
taxonomyofDB-Modelspriorto1976,comparingessentiallytheirmathematical
structuresandfoundation,andthelevelsofabstractionused.

TworepresentativeDB-Modelsarethehierarchical andnetworkmodels,which
emphasizethephysicallevel,andoffertheuserthemeanstonavigatethedatabase
attherecordlevel,thusprovidinglow leveloperationstoderivemoreabstract
structures.

TherelationalDB-ModelwasintroducedbyCoddandhighlightsthe
conceptoflevelofabstractionbyintroducingtheideaofseparation
betweenphysicalandlogicallevels.Itisbasedonthenotionsofsetsand
relations.Duetoitssimplicityofmodeling,itgainedawidepopularity
amongbusinessapplications.

SemanticDB-Modelsallowdatabasedesignerstorepresentobjects
andtheirrelationsinanaturalandclearmannertotheuser(asopposed
topreviousmodels).Theyintendedtoprovidetheuserwithtoolsthat
couldcapturefaithfullythesemanticsoftheinformationtobemodeled.
Awellknownexampleistheentityrelationshipmodel.

ObjectorientedDB-Modelsappearedintheeighties,whenmostofthe
researchwasconcernedwithsocalled“advancedsystemsfornewtypesof
applications.TheseDB-Modelsarebasedontheobjectorientedparadigm
andtheirgoalisrepresentingdataasacollectionofobjectsthatare
organizedinclassesandhavecomplexvaluesassociatedwiththem.

SemistructuredDB-Modelsaredesignedtomodeldatawithaflexible
structure,e.g.,documentsandWebpages.Semistructureddata(also called
unstructured data)is neitherraw norstrictly typed as in
conventionaldatabasesystems.Additionally,dataismixedwiththe
schema,afeaturewhichallowsextensibleexchangeofdata.TheseDBModelsappea
redintheninetiesandarecurrentlyinevolution.
TheXML(eXtendedMarkupLanguage)modeldidnotoriginateinthe
databasecommunity.Althoughoriginallyintroducedasastandardto
exchangeandmodeldocuments,soonitbecameageneralpurpose
model,withfocusoninformationwithtreelikestructure.Similarto
semistructuredmodel,schemeanddataaremixed.SeeSection2.3fora
moreindepthcomparisonamongthesemodels.

OtherModelsandFrameworks.ThereareotherimportantDB-Models
designedforparticularapplications,aswellasmodelingframeworksnot directly
focusing in database issues,which indirectly concern graph database
modeling. Among the DB-Models are Spatial databases, Geographical
Information Systems (GIS), Temporal DB-Models], MultidimensionalDB-
Models].Frameworksrelatedtoourtopic,butnot
directlyfocusingindatabaseissuesareSemanticNetworks.

GraphDatabaseModels–BriefHistoricalOverview

ThenotionofgraphDB-Modelmadeitsappearancealmostinparallelwiththe
objectorientedDB-
Models,asanalternativetothelimitationsoftraditionalDBModelsforcapturingthei
nherentgraphstructureofdataappearinginapplications
suchashypertextorgeographicdatabasesystems,wheretheinterconnectivityof
dataisanimportantaspect.

Activityaroundgraphdatabasesflourishedinthefirsthalfoftheninetiesandthenthe
topicalmostdisappeared.Thereasonsforthisdeclinearemanifold:thedatabase
communitymovedtowardsemistructureddata(aresearchtopicwhichdidnothave
linkstothegraphdatabaseworkinthenineties);theemergenceofXMLcapturedallth
e attentionoftheworkonhypertext;peopleworkingongraphdatabasesmovedto
particularapplicationslikespatialdata,web,documents;thetreelikestructureiseno
ugh
formostapplications.Figure2reflectsthisevolutionbymeansofpaperspublishedin
mainconferencesandjournals.

GraphDB-Modelsemergedwiththeobjectiveofmodelinginformationwhose
structureisagraph.Inanearlyapproach,RoussopoulosandMylopoulosfacingthe
failureofcurrent(atthetime)systemstotakeintoaccountthesemanticsofthe
database,proposedasemanticnetworktostoredataaboutthedatabase.An
implicitstructureofgraphsforthedataitselfwaspresentedintheFunctionalData
Model,whosegoalwastoprovidea“conceptuallynatural”databaseinterface.A
differentapproachproposedtheLogicalDataModel,whereanexplicitgraphDBMo
delintendedtogeneralizetherelational,hierarchicalandnetworkmodels.Years
laterKuniiproposedagraphDB-Modelforrepresentingcomplexstructuresof
knowledgecalledGBASE.

GraphDatamodeling
WhatisaGraphDataModel?

GraphDB-ModelisconceptualizedaccordingtothethreebasiccomponentsofaDB
-Model,namelydatastructures,transformationlanguage,andintegrityconstraints.
AgraphDB-Modelischaracterizedby:

Thedataand/ortheschemaarerepresentedbygraphs,orbydata structures
generalizing the notion of graph (hypergraphs,
hypernodes,hygraphs,etc.).Almosteverybodycoincideonthis
pointmoduloslightvariations.

Letusreviewdifferentwordingsofauthorsonthisissue.Theapproach
istomodelthedatabasedirectlyandentirelyasagraph[58].Agraph DB-
Modelisonewhosesingleunderlyingdatastructureisalabeled
directedgraph;thedatabaseconsistsofasingledigraph.Adatabase
schemainthismodelisadirectedgraph,whereleavesrepresentdata
andinternalnodesrepresentconnectionsbetweenthedata.Directed
labeledgraphsareusedastheformalism tospecifyandrepresent database
schemes,instances,and rules.The modelis basically
definedasalabeleddirectedgraph.Inthismodel,adatabaseis
describedintermsofalabeleddirectedgraphcalledschemagraph.A graphDB-
Modelformalizestherepresentationofthedatastructures
storedinthedatabasesasagraph.Theschemaaswellasthe
instanceofanobjectdatabaseisrepresentedbyagraph.Thenodesof
theinstancegraphrepresenttheobjectsofthedatabase.Database
instancesanddatabasesschemesaredescribedbycertaintypesof
labeledgraphs[68].Themodelfordataisorganizedasgraphs.
Labeledgraphsareusedtorepresentschemesandinstances.
Ontopofthesedescriptions,onecouldaddthefactthatsometimestheschema
andthedata(instances)aredifficulttodifferentiateinthesemodels,afactthat
resemblescloselysemistructuredmodels.Butinmostcasestheschemaandthe
instancesareseparated.

Data manipulation is expressed by graph transformations orby


operationswhosemainprimitivesaddressdirectlytypicalfeaturesof graphs,like
paths,neighborhoods,subgraphs,graph patterns,
connectivity,andstatisticsaboutgraphs(diameter,centrality,etc.). TheDB-
Modeldefinesaflexiblecollectionoftypeconstructorsand
operatorswhichcreateandaccessthegraphdatastructuresorin
otherterms,theapproachistoexpressallqueriesintermsofafew
powerfulgraphmanipulationprimitives.Theoperatorsofthelanguage
canbebasedonpatternmatching,i.e.findingofalloccurrencesofa
prototypicalpieceofaninstancegraph.

Theexistenceofintegrityconstraintsenforcingtheconsistencyofthe data,which
aredirectlyrelated to thegraph datastructure.For example,labelswith unique
names typing constraints on nodes
functionaldependencies,domainandrangeofproperties.

Summarizing,agraphDB-Modelisamodelwherethedatastructuresfortheschema
and/orinstancesaremodeledasa(labeled)
(directed)graph,orgeneralizationsofthe
graphdatastructure,wheredatamanipulationisexpressedbygraphorientedoperati
ons
andtypeconstructors,andhasintegrityconstraintsappropriateforthegraphstructure
.

WhyaGraphDataModel?

TheapplicationareasofgraphDB-
Modelmodelsarethosewereinformationaboutthe
interconnectivityorthetopologyofthedataismoreimportant,orasimportantas,the
dataitself.Thisisusuallyaccompaniedbythefactthatdataandrelationsamongdata
areatthesamelevel.Infact,introducinggraphsasamodelingtoolhasseveral
advantagesforthistypeofdata.
First,itleadstoamorenaturalmodeling:graphstructuresarevisibletotheuser.They
allow anaturalwayofhandlingdataappearinginapplications(e.g.hypertextor
geographicdatabases).Graphshaveanimportantadvantage:theycankeepallthe
informationaboutanentityinasinglenodeandshow relatedinformationbyarcs
connected to it.Graph objects(likepaths,neighborhoods)mayhavefirstorder
citizenship;auser
Typeof Abstract. Basedata Main Model level structure Focus Datacomplex. homogeneity.

Network physical point+rec. records simple/hom.


Relational logical relations data/attributes simple/hom.
Semantic user graphs schema/relations medium/hom.
Object logical/physicalobjects object/methods high/het.
Semistructurelogical tree data/components. medium/het.
Graph logical graph data/relations medium/het

Table1:Acoarsegranularitycomparativeviewamongdifferentgeneral
purposedatabasemodels.Theparametersare:abstractionlevel,base
datastructureused,whatarethetypesofinformationobjectstheDBModelfocusin,co
mplexityandhomogeneityofthedataitemsmodeled.

Second,queriescanreferdirectlytothisgraphstructure.Associatedwithgraphsare
specificgraphoperationsinthequerylanguagealgebra,suchasfindingshortest
paths,determiningcertainsubgraphs,andsoforth.Explicitgraphsandgraph
operationsallowausertoexpressaqueryataveryhighlevel.Tosomeextent,this
isincontrasttographmanipulationindeductivedatabases,whereoftenfairly
complexruleprogramsneedtobewritten..Lastbutnotleast,forpurposesof
browsingitmaybeconvenienttoforgettheschema.

Third,asfarasimplementationisconcerned,graphdatabasesmayprovidespecial
storagegraphstructuresfortherepresentationofgraphsandthemostefficient
graphalgorithmsavailableforrealizingspecificoperations.Althoughthedatamay
havesomestructure,thestructureisnotasrigid,regularorcompleteastraditional
DBMS.Itisnotimportanttorequirefullknowledgeofthestructuretoexpress
meaningfulqueries.Thesystem canuseefficientgraphalgorithmsdesignedto
utilizethespecialgraphdatastructures[58].

ComparisonwithotherDatabaseModels
InthissectionwecomparethemostinfluentialDB-ModelswithgraphDB-Models.
Table1presentsacoarsegranularityoverviewofthemostinfluentialmodels.Below
wepresentthedetails.

PhysicalDB-Models.Theywerethefirstonestoofferthepossibility
toorganizelargecollectionsofdata.Amongthemostimportantonesare the
hierarchicaland network models.These models lack good
abstractionlevelandareveryclosetophysicalimplementations.The
datastructuring isnotflexibleand notaptto modelnontraditional
applications.Forourdiscussiontheydonothavemuchrelevance.

RelationalDB-ModelwasintroducedbyCoddtohighlighttheconcept
oflevelofabstractionbyintroducingacleanseparationbetweenphysical
andlogicallevels.Graduallythefocusshiftedtomodelingdataasseen
byapplicationsandusers.Thisistheemphasisandtheachievementof
therelationalmodel,inatimewherethedomainofapplicationwere
basicallysimpledata(banks,payments,commercialandadministrative
applications).

The relationalmodelwas a landmark developmentbecause it


providedamathematicalbasistothedisciplineofdatamodeling.Itis
basedonthesimplenotionofrelation,whichtogetherwithitsassociated algebraand
logic,madetherelationalmodelaprimarymodelfor
databaseresearch.Inparticular,itsstandardqueryandtransformation
language,SQL,becameaparadigmaticlanguageforquerying.

ThedifferencesbetweengraphDB-ModelsandtherelationalDB-Model
aremanifold.Amongthemostrelevantonesare:therelationalmodelwas
directedtosimplerecordtypedatawithastructureknowninadvance
(airlinereservations,accounting,inventories,etc.).Theschemaisfixedand
extensibilityisadifficulttask.Integrationofdifferentschemesisnoteasy nor
automatizable. The query language does not support paths,
neighborhoodsandseveralothergraphoperations,likeconnectivity(an
exceptionistransitivity).Therearenoobjectsidentifiers,butvalues.

SemanticDB-Modelshavetheirorigininthenecessitytoprovidemore
expressivenessandincorporatearichersetofsemanticsintothedatabase from
theuserpointofview.Theyallow databasedesignerstorepresent
objectsandtheirrelationsinanaturalandclearmanner(similartothewaythe
userviewanapplication)byusinghighlevelabstractionconceptssuchas
aggregation,classificationandinstantiation,subandsuperclassing,attribute
inheritanceandhierarchies.Awellknownexampleistheentityrelationship
model.Ithasbecomeabasisfortheearlystagesofdatabasedesign,butdue
tolackofprecisenesscannotreplacemodelslikerelationalorObjectOriented.
OtherexamplesofsemanticDB-ModelsareIFO
andSDM.ForgraphDBModelsresearch,semanticDB-
Modelsarerelevantbecausetheyarebasedon
agraphlikestructurewhichhighlightstherelationsbetweentheentitiestobe
modeled.

Objectoriented(OO)DB-Models[75]appearedintheeighties,when the database


community realized thatthe relationalmodelwas
inadequatefordataintensivedomains(Knowledgebase,engineering
applications).OO databases were motivated bythe emergence of
nonconventionaldatabaseapplicationsconsistingofcomplexobjects systems
with many semantically interrelated components as in
CAD/CAM,computergraphicsorinformationretrieval.Accordingtothe
OOprogrammingparadigm onwhichtheyarebased,theirobjectiveis
representingdataasacollectionofobjectsthatareorganizedinclasses
andhavecomplexvaluesandmethodsassociatedwiththem.Although OO DB-
ModelspermitmuchricherstructuresthantherelationalDBModel,theystillrequiret
hatalldataconformtoapredefinedschema.

OO DB-ModelshavebeenrelatedtographDB-Modelsduetothe
explicitorimplicitgraphstructureintheirdefinitions.Nevertheless,there
remainimportantdifferencesrootedintheform thateachofthem
modelstheworld.OODB-Modelsviewtheworldasasetofcomplex
objectshavingcertainstate(data)andinteractingamongthem by
methods.Onthecontrary,graphDB-Models,viewtheworldasanetwork
ofrelations,emphasizing theinterconnection ofthedata,and the
propertiesoftheserelations.TheemphasisofOODB-Modelsisonthe
dynamicsoftheobjects,theirvaluesandmethods.Incontrast,graphDBModelsemph
asizestheinterconnectionwhilemaintainingthestructural
andsemanticcomplexityofthedata.

SemistructuredDB-Models.Theneedforsemistructureddata(alsocalled
unstructureddata)wasmotivatedby:theincreasedexistenceofunstructured
data,dataexchangeand,databrowsing.Insemistructureddatathestructureis
irregular,implicitandpartial;theschemadoesnotrestrictthedata,only
describesit,isverylargeandrapidlyevolving;theinformationassociatedwitha
schemaiscontainedwithinthedata(datacontainsdataanditsdescription,soit
isselfdescribing).AmongthemostrepresentativemodelsareOEM,Lorel,UnQL,
ACeDBandStrudel.Generally,semistructureddataisrepresentedbyatreelike
structure.Neverthelesscyclesbetweendataarepossible,establishinginthis
wayastructuralrelationwithgraphDB-Models.Someauthorscharacterize
semistructureddataasrooteddirectedconnectedgraphs.

GraphDataModelMotivationsandApplications

GraphDB-Modelsaremotivatedbyreallifeapplicationswhereinformationabout
interconnectivityofitspiecesisasalientfeature.Wewilldividetheseapplication
areasinClassicalandComplexnetworks.

Classical Applications. The applications that motivated the


introductionofthenotionofgraphdatabasesweremanifold:

Generalizations ofclassicalDB-Models.Classicalmodels were


criticizedfortheirlackofsemantics,theflatstructureofthedata
theyallow,thedifficultiesfortheuserto“see”theconnectivityof
thedata,andthedifficulttomodelcomplexobjects.

Onthesamedirection,theobservationthatgraphshavebeenintegral
partofthedatabasedesignprocessinsemanticandobjectorientedDB
-Models,broughttheideaofintroducingamodelinwhichboth,data
manipulationanddatarepresentationweregraphbased.

Limitations of expressive power of languages for complex


applicationsmotivatedalsothesearchformodelsthatresemble
morecloselysuchapplications.

Limitations(atthetime)ofknowledgerepresentationsystems,and
theneedforintricatebutflexibleknowledgerepresentationand
derivationtechniques.

TheneedforimprovingfunctionalitiesofobjectorientedDB-Models.In this
direction the application in mind were CASE,CAD,image
processing,andscientificdataanalysis.

Graphicalandvisualinterfaces,geographical,pictorialandmultimedia systems.

ApplicationswheredatacomplexityexceededtherelationalDB-Model
capabilitiesalsomotivatedgraphdatabases.Forinstance,managing
transportnetworks(train,plane,water,telecommunications),spatially
embeddednetworkslikehighway,publictransport.Severalofthese
applicationsarenowinthefieldofGeographicalinformationsystems
andspatialdatabases.

There are otherapplications who motivated graph DB-Models:


softwaresystems,integration.

ComplexNetworks.Severalareashavewitnesstheemergenceof huge
networksofdata which share some particularmathematical parameters, called
complex networks. The need for database
managementforsomeclassesofthesenetworkshasbeenrecently
highlighted.Althoughitisnotevidentyetiffrom thepointofviewof
databasesonecantreatthemasawhole,wewilldescribethemtogether
forpresentationpurposes.AfterthesurveyofNewman,wewillgroup them in four
categories:socialnetworks,information networks,
technologicalnetworksandbiologicalnetworks.Followingwedescribe
specificexamplesforeachofthem.

Insocialnetworks,nodesarepeopleandgroupswhilelinksshow
relationshipsorflowsbetweenthenodes.Someexamplesarefriendship,
businessrelationships,patternsofsexualcontacts,researchnetworks
(collaboration,co-authorship),communicationrecords(mail,telephone
calls,email),Computernetworks,Nationalsecurity.Thereisgrowing
activityintheareaofSocialNetworkanalysis,visualizationanddata
processinginsuchnetworks.

Ininformationnetworksoccurrelationssuchascitationsbetween
academicpapers,WorldWideWeb(hypertext,hypermedia),peertopeer
networks,relationsbetweenwordclassesinathesaurus,preference networks.
Intechnologicalnetworksthestructureismainlygovernedbyspaceand
geography.SomeexamplesareInternet(asnetworkofcomputers),Electric
powergrids,airlineroutes,telephonenetworks,deliverynetwork(postoffice).
TheareaofGeographicInformationSystems(GIS)istodaycoveringabigpart
ofthisarea(roads,railways,pedestriantraffic,rivers).

Biologicalnetworks represent biologicalinformation whose volume,


managementandanalysishasbecomeanissueduetotheautomationofthe
processofdatagathering.GoodexampleistheareaofGenomics,where
networksoccuringeneregulation,metabolicpathways,chemicalstructure,map
orderand homologyrelationshipsbetween species.There otherkindsof
biologicalnetworks,suchasfoodwebs,neuralnetworks,etc.Theareahasa
tremendousgrowth rate.The readercan consultdatabase proposalsfor
genomics,anoverview ofmodelsforbiochemicalpathways,atutorialon
GraphDataManagementforBiology,andamodelforChemistry.

Itisimportanttostressthatclassicalquerylanguagesofferlittlehelp
whendealingwiththetypeofqueryneededintheaboveareas.Asexamples,
dataprocessinginGISincludegeometricoperations(areaorboundary,
intersection,inclusions,etc),topologicaloperations(connectedness,paths,
neighbors,etc)andmetricoperations(distancebetweenentities,diameter
ofthenetwork,etc).Ingeneticregulatorynetworksexamplesofmeasures
areconnectedcomponents(interactionsbetweenproteins)anddegreesof
nearestneighbors(strongpaircorrelations).Insocialnetworks,distance,
neighborhoods,clusteringcoefficientofavertex,clusteringcoefficientofa
network,betweenness,sizeofgiantconnectedcomponents,sizedistribution
offiniteconnectedcomponents.SimilarproblemsariseintheSemanticWeb,
wherequeryingRDFdataincreasinglyneedsgraphfeatures.

RepresentativeGraphDatabaseModels

InthissectionwedescribeinsomedetailthemostrepresentativegraphDB-Models,
choosingthosethatdefineanduseexplicitlygraphstructuresorgeneralizationsof
them.Additionallywedescribeotherrelatedmodelsthatusegraphs,donotfit
properlyasgraphDB-Models.Inthem,graphsareused,forexample,fornavigation,
fordefiningviews,oraslanguagerepresentation.
Foreachproposal,wepresenttheirdatastructuresand,whenavailable,theirquery
languagesandintegrityconstraintrules.Ingeneral,therearefewimplementationsan
d nostandardbenchmarks,henceweavoidsurveyingthisissue.Togiveaflavorofthe
modelingineachproposal,wewillrunthefollowingexampleaboutatoygenealogy
showninFigure3.

Figure2:Agenealogydiagram(righthandside)representedastwotables(lefthand
side)NAMELASTNAMEandPERSONPARENT.
(Childreninheritthelastnameofthe fatherjustformodelingpurposes.)

Figtype3:LogicalDataModel.Theschema(ontheleft)usestwobasictypenodes
forrepresentingdatavalues(NandL),andtwoproducttypenodes(NLandPP)
toestablishrelationsbetweendatavaluesinarelationalstyle.Theinstance
(ontheright)isacollectionoftables,oneforeachnodeoftheschema.Notethat
internalnodesusepointers(names)tomakereferencetobasicandsetdata
datavaluesdefinedbyothernodes.

LogicalDataModel(LDM)

MotivatedbythelackofsemanticsintherelationalDB-
Model,KuperandVardiproposed aDB-
Modelthatgeneralizestherelational,hierarchicalandnetworkmodels.Themodel
describesmechanismstorestructuredata,alogicalquerylanguageandanalgebraic
querylanguage.

InLDM aschemaisanarbitrarydirectedgraphwhereeachnodehasoneofthe
followingtypes:TheBasictypedescribesanodethatcontainsthedatastored;the
CompositiontypeTEXdescribesanodethatcontainstupleswhosecomponents

aretakenfromthechildrenofit;theCollectiontypedescribesanodethatcontains
sets,whoseelementsaretakenfromchildrenofit.Summarizing,internalnodesare
oftype⊗ or⊛ representingstructureddata,terminalnodesareoftypeand
representatomicdata,andedgesrepresentconnectionsbetweendata.

Asecondversionofthemodel,besidesrenamingthenodes ⊗and
⊛ asproductandpowerrespectively,incorporatesanewtype,theUniontype∪ ,
intendedtorepresentacollectionwhosedomainistheunionofthedomainsofits
children(seeexampleinFigure4).

ALDMdatabaseinstanceconsistsofanassignmentofvaluestoeachnodeofthe
schema.Inthissense,theinstanceofanodeisasetofelementsfrom the
underlyingdomain(forbasictypenodes)andtuplesorsetstakenfromtheinstance
ofthenode’schildren(for⊗,⊛andtypes).

Withtheobjectiveofavoidingcyclicityattheinstancelevel,themodelproposestoke
ep
adistinctionbetweenmemorylocationsandtheircontent.Thus,instancesconsistofa
setoflvalues(theaddressspace),plusanrvalue(thedataspace)assignedtoeachof
them.Thesefeaturesallowtomodeltransitiverelationslikehierarchiesandgenealo
gies.
Overthisstructureafirstordermanysortedlanguageisdefined.Withthislanguage,
aquerylanguageandintegrityconstraintsaredefined.Finally,andalgebraic
language–equivalenttothelogicallanguage–isproposed,providingoperations
fornodeandrelationcreation,transformationandreductionofinstances,andother
operationslikeunion,differenceandprojection.

LDM isacompleteDB-Model(i.e.datastructuresplusquerylanguagesandintegrity
constraints)Themodelsupportsmodelingofcomplexrelations(e.g.hierarchies,
recursiverelations).Thenotionorvirtualrecords(pointerstophysicalrecords)pro
ves
usefultoavoidredundancyofdatabyallowingcyclicityattheschemaandinstancelev
el. Duetothefactthatthemodelisageneralizationofothermodels(liketherelational
model),theirtechniquesorpropertiescanbetranslatedintothegeneralizedmodel.A
relevantexampleisthedefinitionofintegrityconstraints.

Figure4:HypernodeModel.Theschema(left)definesapersonasacomplexobject
withthepropertiesnameandlastnameoftypestring,andparentoftypeperson
(recursivelydefined).Theinstance(ontheright)showstherelationsinthegenealogy
amongdifferentinstancesofperson.

Chapter2

GraphSchemas

Selectingvertexlabels

TheTinkerpoppropertygraphmodelcanbesummarizedasfollows.Agraphhasaset
of
verticesandasetofedges.Eachedgeconnectsanoutvertextoaninvertex.Verticesand
edgescanhavepropertieswhicharekeyvaluepairswithStringkeysandprettymucha
ny valuethattheunderlyingdatabasesupports.

Sofar,themodellooksschemalesssinceverticesandedgescan'tbedistinguishedfro
m otherverticesandedgeswithoutknowingwhatthepropertiesmean
.However,edgeshave
alwayshadlabels.AndwithTinkerpop3,verticeswillhavelabelsaswell.Thesamei
strue withNeo4J'slatestmajorversion.

Ifeveryvertexmust belabeled,whatisthecorrectmethodtoselectalabel?
Whatshoulda labelsayaboutavertexoranedge,fromtheapplication'sperspective?
Wethinkavertexlabelshouldrepresentthemostgranuartype
ofthevertex,whereeach "vertextype"
isassociatedwithaunquecombnaonof:
meaning(semantics),
setofpropertykeynamesandvaluetypes,and
setofoutgoingedgelabels,whereeachlabeltypeisannotatedwiththepossible
directionsoftheedge(in/out/both)andcardinality.

Whyso?
Becauselabelsrepresentingvertextypesgivetheapplicationthemostdetailed
informationaboutthe behaor ofthatvertex,therebyensuringthattheapplicationcan
processthevertexaccordingly.Inotherwords,oneshouldnotbeabletosubdivideav
ertex
typetogettwovertextypesthatbehavedifferentlyfromtheapplication'sstandpoint.

Examplesoflabelselection

Let'sgothroughthelabelselectionexercisewiththeclassic6vertextinkergraphshow
nin
thepropertygraphmodelpage.SincethisisaTinkerpop2stylegraph,itdoesn'thavev
ertex
labels.We'llnowtrytocomeupwiththevertexlabelsbysimplylookingatthevertexbe
havior.
Fgure1:TnkerGraphexampe

Ifyoulookclosely,therearetwotypesofvertices:oneswith'name'and'age',andones
with 'name'and'lang'.Letuslabe
theformervertextypeas'Person'andthelattervertextypeas
'Software'.Inotherwords,youhavepersonsnamed'marko','vadas','peter'and'josh'
and softwaresnamed'lop'and'ripple'.

Afteranalyzingtheedgelabelsanddirection,youcouldsaythatthe'Person'vertextyp
ehas:
Propertykeys'name'and'age'
Edgeslabeled'knows'intheOUTdirection
Edgeslabeled'created'intheOUTdirection
The'Software'vertextypehas:

Propertykeys'name'and'lang'
Edgeslabeled'created'intheINdirection
Now,anapplicationlookingatthisgraphautomaticallyknowswhattoexpectwhenitr
eadsa
vertexlabeled'Person'or'Software'.Wecandefinetwodifferentindexeson'name',o
nefor
PersonandoneforSoftware,tomakesurethatsoftwaresearchesdotpckuppeope
,or viceversa.

Thelabelselectionprocesscan'tbefullymechanicalthough.Forinstance,apersonw
ithno
friendscanbethoughtofasaseparatevertextype,becausetherearenoadjacent'know
s'
edgestosuchvertices.However,unlessthismakessenseinthecontextoftheapplicati
onor
thedatamodel,thereisnopointinsubdividingthe'Person'vertextypeas'Loner'and'P
erson
withFriends'.Thesameargumentgoesforsubdividingthepersonvertextypeasthe
Developer'and'NonDeveloper'basedonwhetherthatpersoncreatedasoftware.

Torecap,therightwaytoselectvertexlabelsforapropertygraphistofirstfigureoutthe
vertextypesandthebehaviorsofeachvertextype.Thetotityofthesebehaorssthe
graphschema.

Drawingagraphschema
Thebestwaytorepresentagraphschemais,ofcourse,agraph.Thisishowthe
graphschemalooksfortheclassicTinkerpopgraph.
Fgure2:Exampegraphschemashownasapropertygraph

Thegraphschemaisprettymuchapropertygraph.Theverticescorrespondtovertexty
pes,
andedgescorrespondtoedgetypes.Thepropertykeysarenamedaftertheallowed
propertykeysforthatvertextype.Everypropertyvalueintheschemagraphcontainsth
e nameofthemostspefcsuperass representingthecorrespondingpropertyvalues
oftheinstancegraph.Optionalpropertiescanhavea'?'aftertheclassname(notshown
here).

Edgepropertiesarelikevertexproperties,exceptthatthereisaspecialpropertyname
d'#' thatholdsthecarnaty from
theouttoinvertextypes.Commonsensedictatesthatthe
cardinalityisM:N,i.e.,many-to-
many,forboth'knows'and'created'.Onecouldbemisledto
thinkthatsomeoftheserelationshipsare1:Nbylookingatthe6vertexgraph.Thisisan
other reasonfornotfullyrelyingonreverseengineeringmethodstoderiveschemas.

WehavegonethroughasimilarexercisefortheGratefulDeadgraph.Asyoucansee,th
egraph
schemaisverysimple,althoughthevisualizationofthegraphshowninthelinklooks
complicated.

Fgure3:GratefulDeadgraphschema
OurthirdandfinalexampleistheschemafortheKennedyfamilytreegraph.Again,the
schemaisextremelysimple(simplisticgivenrecentUSSupremeCourtrulings).
Fgure4:Famlytreegraphschema

NotethatinthePixyschema,thepropertylistsarethesamefor'Man'and'Woman',butth
e
directionofthe'wife'edgeisfunctionallydependentonthevalueofthe'sex'property.
Thisis veryinterestingbecausethismeansthatgraphschemascoudbenormazed
usingruleslike relationaldatabases.Wewilldiscussthisinlatersections!

Summary

Thissectionintroducedtheideaofschemasforpropertygraphsanddescribedhow
the
schemaitselfcanberepresentedasapropertygraph.Furthermore,itdescribedameth
odto deve
thegraphschemaforanexistingpropertygraphbyfindingthemostgranulardivision
ofitsverticesintovertextypes.

Graphschemas(orschemagraphs)helpapplicationdevelopersbetterunderstandth
e graph'sstructure.
Inthenextsection,wewilllookattheproblem theotherwayaround.Canwe deve
agraph schemafrom
ahigherlevelconceptualmodelsuchasanEntityRelationshipmodel?Could
thisbeasystematicmethodtoselectvertexandedgelabels,andpropertykeyswhen
designingagraphdatabaseapplication?

Chapter3

ConvertingERmodelstographschemas

Thissectionwilldescribeageneralmethodtoconvertanentityrelationshipmodelto
a
propertygraphschema.Usingthismethod,adatabasedesignercandevelopERmode
lsusing
standardconceptualmodelingpractices,butstorethedatainagraphdatabaseinstead
ofa relationaldatabase.

ERmodelsanddiagrams

TheentityrelationshipmodelwasproposedbyPeterCheninhis1976papertitled"Th
e
EntityRelationshipModelTowardaUnifiedViewofData".Theideasinthispaperar
e taughtinmostdatabasecourses.ThiscoursepagegivesaquickdescriptionoftheER
model.

Conceptualmodelingisaparticularlyusefulexercisewhenembarkingonaprojectth
at involvesanewdomain.Thegoalofthisexerciseistoidentifykeyconceptsinthe
domainthatmustbecapturedinthedatamodel.Oneofthetechniquesinconceptual
modelingistolookatthenaturallanguagedescriptionofanapplication'srequirement
s.
Theserequirementscanbeanalyzedtoidentifytheentityandrelationshiptypes,using
Chen's"rulesofthumb"(quotedfromWikipedia):

Commonnoun Entitytype
Propernoun Entity
Transitiveverb Relationshiptype IntransitiveverbAttributetype
Adjective Attributeforentity Adverb Attributeforrelationship
Example
Letusconsiderthefollowingrequirements:

Modelasystem whereuserscreatepages,whichtheyown.Userscaninviteother
userstolookatcertainpagesthattheyown.Apagecanspecifyoneormoretags
whicharethenusedtorecommendothersectionstotheauthorsandinvitedreaders.
Youcouldanalyzethisrequirementandcomeupwiththreeentitytypes,viz.User,Page
and
Tag.TherelationshiptypesOwns,InvitesandTaggedAscapturetherelationships.N
otethat
allverbsdon'tbecomerelationships(likecreate).Similarly,thefactthatinvitationso
nly applytopagesthatauserownsislostinthismodel.

Fgure5:ExampeERdagram

Thesquareshapedboxesshow
entitytypes,whichrepresentsetsofsimilarentities.The
diamondshapedboxesshowrelationshiptypes,whichrepresentsetsofsimilarrelati
onships. Arelationshiptyperelatestwoormoreentitytypestoeachother.

Thediagram
showsthecardinalityofeachentity'scontributiontoarelationship,suchas1:N
(onetomany)orN:N(manytomany).Thecardinalityisspecifiedusingthe'lookacros
s'
method.Forexample,aUserownsNpages,andapageisownedby1user.Therearekn
own limitationsoflookacrosscardinalityforternaryrelationshipslikeInvites.

Thediagram
alsoshowssomeovalshapedattributes,likeusername.Theseattributesmustbe
assignedtoentityorrelationshipstypes.Attributesthatserveasexternalidentifiersm
ustbe underlined.

Now,itisarguablewhetherTagmustbeanentityornotinthefinaldatamodel.But
fromanERperspective,itmakessensetomodeltagasanentity,especiallyiftagsare
usedtoestablishrelationshipsacrossusersforrecommendations.

ProceduretoconvertanERmodeltoagraphschema

TheproceduretoconvertanERmodeltoarelationalmodeliswellknownanddiscuss
edin
thesameOSUcoursenotesthatwereferencedearlier.Wewillnowgothroughasimila
r proceduretheERdiagramwiththeaboveexample.

Rule#1:Entitytypesbecomevertextypes
EntitytypessuchasUser,PageandTagbecomevertextypes.
Thenameoftheentitytypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.

Notethatwearedrawingagraphschema,notagraphinstance.SotheUsertyperefersto
any
numberofusersinboththeERandthegraphschemarepresentation.Henceweusethet
erm
"vertextype"andnotvertex.Theentityrelationshipmodelusessimilartermssuchas"
entity types"(likeUser)andentities(likeJohnDoe,theuser).

Rule#2:Binaryrelationshiptypesbecomeedgetypes
AllbinaryrelationshiptypesintheERdiagram
canbeconvertedtoedgetypesinthegraph schema.
Thenameoftherelationshiptypebecomesthelabeloftheedgetype.
Theassociatedattributesbecomethepropertiesoftheedgetype.
Theendpointsoftheedgetypearethevertextypescorrespondingtotherelatedentity
types.Thedirectiondoesn'tmatter.

Hereisanexampleshowingthe"Owns"relationshiptypetranslatedtoan"owns"edg
etype:
Notethatonetomanyandmanytomanybinaryrelationshipscanbemodeledasedges
without
introducingnewvertices.Withrelationalmodels,youwouldneedanadditionaltable
tocapture manytomanyrelationships.

Fgure7:Ownsraonshpconvertedtoanownsedge

Aminorpointisthatthecardinalityiswrittenas1:NbecausetheUser(outvertextype)t
o
Page(invertextype)relationshipisa1:Nrelationship,usingthelookacrossmethod.I
nother
words,auserhasNpagesandapagehas1user.Ifthedirectionoftheedgewerereverse
d, thecardinalitywouldbeN:1.

Rule#3:Naryrelationshiptypesbecomevertextypes
Naryrelationshiptypesrelatemorethantwoentitytypes.Suchrelationshiptypesbec
ome vertextypesinthepropertygraphmodel.
Thenameoftherelationshiptypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.

Thenewvertextypeincludesedgestothevertextypescorrespondingtotherelatedent
ity types(seeexample).Theseedgetypesarelabeledaftertheroleoftheparticipating
entityintherelationship.Thedirectiondoesn'tmatterforanyoftheseedges.

HereisanexampleshowingtheternaryrelationshipInvitestranslatedtothevertextyp
e Invitation:
ThecardinalityinthegraphschemaisN:1becausetheInvitationtoPagerelationshipi
sanN:1
relationship,usingthelookacrossmethod.Inotherwords,aninvitationcouldbeissue
dto1
page,andapage(invertex)couldbepartofNinvitations.Itispossibletojustreverses
omeof
theroletypes,likeinvitee,withoutaffectingtheoverallmodel.Inthatcasethecardinal
itywill be1:N.

Fgure8:IntesraonspconvertedtoanIntaonvertextype

Wehaven'tshowntheprocessforweakentitytypesandidentifyingrelationshiptypes
but
theseareexactlythesameasentitytypesandrelationshiptypes.Graphdatabasesare
more
forgivingthanrelationaldatabasesinthattheyallowtwoverticestohavethesamelab
eland
propertykeyvaluepairs.Thissimplifiesthetranslationofweakentitytypesandidenti
fying relationshiptypesintothepropertygraphmodel.

Conversionexample

HereisthegraphschemacorrespondingtotheexampleERdiagram.Asyoucansee,thi
s
diagramprovidesenoughinformationforanapplicationdevelopertoworkwiththeg
raph database.

Fgure9:GraphschemaforUserPageTagERdagram

Thisisthe"logicalmodel"fortheexampleconceptualmodelintroducedinthefirstfig
ure.We
cantweakthismodelfurtherbyrenamingthelabels,changingdirectionsoftheedges,a
ndso on.Thiswillbethetopicofthenextsection.

Verticesarevertices,andedgesare...edges

Naryrelationshipsareverycommoninconceptualmodels.Forexample,"Joebought
a
headphoneatTarget"isanexampleofa"Bought"relationshipthatrelatesaUsertoaPr
oduct
toaStore.Suchrelationshipsmustbemodeledasvertices,notedges(unlessyouareus
ing hypergraphs).Hencewethinkitismseadng
tothinkofedgesasrelationshipsandvertices asentities.

Itisbettertothinkofgraphsare vsuizeaberepresentaons ofaconceptualmodel.We


emphasizethevisualnatureofgraphsbecausedrawingandthinkingintermsgraphsis
easy.
Forinstance,yougototheWikipediaentryforhypergraphs,youwillseewhyvisualizi
ng hypergraphsisn'taseasyasvisualizing(binary)graphs.
Summary

This section showed thatitis possible to convertanyentityrelationship modelto


a
propertygraphschema.Inotherwords,adataarchitectcanusestandardmethodstom
odela domainasanERdiagram andthenfollow
thisproceduretoconvertittoapropertygraph
schema.ThistypeofatranslationisnotobviousforotherpopularNoSQLmodelslike
keyvalue storesanddocumentstores.

Chapter4

NormalizingGraphSchemas

Thissectionlooksathow graphschemascanbemanipulatedandtransformedto
equivalentgraphschemas.Thisissimilartothesplittingandmergingoftablesin
relationaldatamodels,typicallyperformedtonormalizeordenormalizearelational
schema.

Normalizationofrelationaldatabases

Thegoalofdatabasenormalizationismakesurethatrelationalschemasareeasytomo
dify,
easytoextend,informativetousersandsupportiveofvariousquerypatterns.Thevari
ous
normalforms,suchas1NF,2NF,andsoon,defineconstraintsthatatablemustsatisfyto
be
compliantwiththatnormalform.Althoughthedefinitionsofthenormalformscanbe
mathematical,thebasicideaisbreakuptableswithduplicateinformation.Hereisan
examplefromtheWikipediapageon3NF:
Thepreviousfigurebreaksupthetournamentwinnerstableintotwotables,onewithp
layer
detailsandonewiththetournamentdetails.Theactualruleson"functionaldependenc
ies"and
"nonprimeattributes"arehardtoremember,buttheprocessofsplittingandmergingta
bles
comesintuitivelywithexperience.Forexample,iftherewasanexistingtablewhichh
adone rowperplayer,we'dprobablymovethe"dateofbirth"tothattable.

Transformationrulesthatproduceequivalentschemas

Thissectionlistssometransformationrulesthatproduceequivalentgraphschemas.
Agraph schemaisequven
toanothergraphschemaifthedatastoredinoneschema,alongwith
theapplicationsthataccessit,canbeportedtotheotherschema,andviceversa.These
rules arelikesplittingandmergingtablesinrelationalmodels.
Thetransformationrulesinthissectioncanbemechanicallyappliedtoanyschema,an
dhas
nothingtodowithitssemantics.Byapplyingacombinationoftheserules,youcouldsi
mplify thesemanticsandimprovetheusabilityofyourgraphmodel.

RuleA:Renamingpropertiesandlabels
Thisruleconsistsofthreetransformationsthatresultinequivalentschemas:
Anyvertexlabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
vertexlabel.
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexistingedge
labelbetweentheoutandinvertextypes.
Anyvertex/edgepropertycanberenamedsolongthenewnamedoesn'trefertoan
existingpropertyofthevertex/edgetype.
Thefollowingfigureillustratessomeexampleapplicationsofthisruleonvertexande
dgelabels:

Fgure10:Renamngproperesandabes

Theschemashowninthetopisasimplegraphschemashowingfamilyrelationships.T
his schemaistransformedtotheschemashowninthebottom ofthefigureusingthe
followingtransformations:
VertexlabelsManandWomanarerenamedtoMaleandFemale.
Edgelabelsmother(2instances),father(2instances)arerenamedtoparent.

Eventhoughitseemslikesomeinformationislostbyrenamingmother/fathertoparent
,this
isn'ttruebecausethevertexlabelsattheendpoints(Male/Female)havethatinformati
on.This
sametransformationwouldn'tbesoobviouswhilelookingataninstanceofthisgraphl
ikethe Kennedyfamilytree.

Notethatyoucannotrename'wife'to'parent'inthebottomschema.Thisisbecausether
e alreadyexistsaparentedgetypefromMaletoFemale.
RuleB:Reversingedgedirections

Thisrulestatesthatanedgetypecanbereversedprovideditisaselfloop,orthereisnoe
dge
typewiththesamelabelinthereversedirection.Thecardinalityoftheedgetypeisreve
rsedas well.

Fgure 1:Reverngedgedrecons
Thefollowingfigureillustratesanexampletransformationusingthisruleandtheprev
ious one.Thetransformationinvolvesthefollowingsteps:
The'wife'edgeisrenamedto'husband'(ruleA)andthenreversed.
Eachparentedgeisrenamedto'son'or'daughter'andreversed.

Notethatthereversalisdoneinthegraphinstanceaswellastheschema.Inotherwords
, JFKJrparent>JFKSr.becomesJFKSr.son>JFKSr.
Youcouldalwaysrenamethefour'son'and'daughter'edgetypes,to'child'usingruleA
. Again,noinformationislostsincethevertexlabelsarestillunique.

Youwould,however,notbeabletorename'husband'to'son'.Youcouldrename'husba
nd'
to'daughter'(thoughabsurd).Theapplicationwillhavetointerpret"maledaughters"
as
husbands.Butafteryourenamehusbandtodaughter,youwouldnotbeabletoreverseit
s direction.

Asyoucanseealready,someapplicationsoftheserulesmaybequitehardtoderiveify
ouare thinkingintermsofgraphinstances,ratherthangraphschemas.
RuleC:Propertydisplacement

Fgure12:Propertydsacement

Thisrulestatesthatapropertyonanedgetypecanbemovedtoeitheradjacentvertexty
pe,
provideditslookacrosscardinalityis1.Thereverserulestatesthatapropertyinavert
extype
canbemovedtoanadjacentedgetypewithlookacrosscardinalityof1,providedthee
dge alwaysexistswhenthepropertyexists.

Theadjoiningfigureclarifiestherule,wherethe'dateOfBirth'propertyismovedtoth
e
'mother'relationshipbecausethereisexactlyonemotherrelationshipperMan/Wom
anandit
isdefinedwhenthedateOfBirthisdefined.Ifyourename'dateOfBirth'to'deliveryDa
te',one couldarguethatthepropertybelongsintheedgeandnotthevertex.

NotethataMan'sdateOfBirthcannotbedisplacedtothewiferelationshipbecausetha
t
wouldmeanthatthedataOfBirthcannotbestoredunlessthepersonismarried.Simila
rly,
thedateOfBirthintheedgetypelabeled'mother'fromMantoWomaninthebottomsche
ma, cannotbemovedtoWomanbecauseofthecardinalityrestrictionsintherule.

Usingthisrule,youcanmovethepropertiesaroundtheschematocomeupwithabetter
lookingdesign.Thisruleisalsousefulinsatisfyingindexingrequirementsofvarious
graph
databases.Forexample,ifagraphdatabaseonlysupportsindexesonvertexpropertie
s,you
couldmovesearchablepropertiesfromtheedgestovertices.Similarly,ifagraphdata
base
supportsvertexcentricindexesbasedonpropertiesonadjacentedges/vertices,youc
anuse thisruletobringtheindexedpropertyclosertothevertextypeofinterest.

RuleD:Specializationandgeneralization
Thisrulestatesthat:
AnyvertextypecanbedividedintotwodisjointvertextypesbasedonaBooleanteston
thepropertiesandadjacentedgelabelsofavertexbelongingtothattype.
AnyedgetypecanbedividedintotwodisjointedgetypesbasedonaBooleanteston
thepropertiesandadjacentvertexlabelsofanedgebelongingtothattype.
Fgure13:Generizaon
Inotherwords,ifweprovideabooleanfunctionthatcangiveaT/Fresultgivenavertex
/edge, wecanusethatfunctiontodivideavertex/edgetypeintotwodifferenttypes.
Thereverserulestatesthat:
Anyvertex/edgetypecanbemergedintoanothervertex/edgetypeprovidedthereisa
Booleantestthatcandistinguishitsvertices/edgesfromthemergedvertices.
Theadjoiningfigureshowsanexampletransformationinvolvingthefollowingsteps
:
MaleandFemalearegeneralizedasPerson,becausethebooleantest,sexequals'M',
candistinguishMalefromFemale.
Afterthat,sonanddaughteredgetypesaregeneralizedaschildbecausethebooleante
st, sexofinvertexequals'M',candistinguishsonfromdaughter.

Thisruleisusefulinincreasingthespecificity,orreducingthecomplexityofthegraph
schema.
Asageneralprinciple,itisbettertousethisruleforspecialization,we.e.,increasingth
e
specificity,becausethatallowsthedifferentvertexandedgetypestoembracediffere
nt
behaviorintermsofpropertiesandadjacentedges.However,thereareinstanceswhe
rethe
differencesbetweenthevertextypesaresominorthatspecializationonlyresultsinap
plication
complexity.ThisargumentcouldapplytotheabovegeneralizationofMaleandFemal
eto Person.

RuleE:Edgepromotion

Fgure14:Edgepromoon

Thisrulestatesthatanedgetypecanbe promoted
toavertextypebyaddingtwo"out"edge
typestotheendpoints.Thepropertiesofthevertextypebecomepropertiesoftheedget
ype.
ThecardinalityofthenewedgetypesareN:1or1:1dependingonthelookacrosscardi
nalityof theoriginalendpointvertex'stype.

Notethatthedirectionofthenewedgetypescanbechangedusingontherenameand
reverserulesresp.Weonlymentionthe"out"directiontosimplifythewayinwhich
cardinalityforthenewedgestypesisderived.

Theadjoiningfigureshowsthehusbandedgepromotedtoavertextypecalled'Marria
ge'. Theedgetypes'husband'and'wife'pointtothetwoendpointsofthevertextype.
TheedgepromotionruleisusefulinapreparingbinaryrelationshiptobecomeanNary
relationship.

Thereverserulestatesthatanyvertextypewithtwopropertylessedgetypes,withsam
eside cardinalityofexactly1,canbedemoted
toanedgebetweentheadjacentvertices.Thisprocess
isusefultosimplifyschemas.Youcanusethepropertydisplacementrule(ruleC)tom
ove propertiesoutofedges.

RuleF:Propertypromotion

Fgure15:Propertypromoon

Thisrulestatesthatanygroupofpropertiescanbepromotedtoanewvertextypewithth
ose
properties,providedthenewvertextypehasedgesconnectingittoallexistingvertext
ypesthat
includethepropertygroup.Thesamesidecardinalityofthenewedgetypeis1.
Theadjoiningfigureshowsthe'sex'propertyconvertedtoanewvertextype.Thisvert
extype
willhaveexactlytwonodescorrespondingtomaleandfemale.Soinotherwords,eve
ryperson
inthenewgraphwillhaveanoutgoing'isa'edgetooneofthetwonewvertices.

Thisruleisequivalenttothesplittingofarelationintotworelations,asshowninthefirs
t
figureofthissection.Anygroupofproperties,typicallyonesthatrepeat,canbepromo
tedtoa vertex.

Whileapplyingthisrule,itisbettertoincludeallvertextypesthathavethesamegroupo
f
properties.Forexample,ifthereisa'sex'propertyinadifferentAnimaltype,itisbetter
to
pointthattothenewSexvertextypeaswell.Ifyouhaveedgetypeswiththepropertygro
up, youcanfirstpromotethoseedgetypestovertices.

Thereverseofthisruleisthatavertextypethathaspropertylessedgetypeswithsamesi
de
cardinalityof1,canbedemotedtothegroupofpropertiesthatitholds.Thesepropertie
smust
beaddedtoeveryadjacentvertextype.Thisistheequivalentofthedenormalizationof
atable
intherelationalmodel,whichisusefultoreducethenumberofjoins(ortraversalsinth
ecase ofgraphdatabases).
RuleG:Propertyexpansion

Fgure16:Propertyexpanson

Thisrulestatesthatapropertyofavertextypethatrepresentsalistofvaluescanbemov
ed
toaseparatevertextypewhichstoreseachvalue.Thenewvertextypemusthavean"in
"
edgetypefromtheexistingvertextypewithcardinality1:N.Theadjoiningfiguresho
wsthis ruleappliedtothenicknamepropertywhichholdsalistofStrings.
Thereverserulestatesthatanyvertextypewithexactlyonepropertylessedgetypewit
h
lookacrosscardinalityofexactly1canberemovedaftermovingitspropertiestoalisti
nthe adjacentvertextype.

Thisistheequivalentof1NFintherelationalmodel.Unlikerelationaldatabases,how
ever,many
graphdatabasessupportlistsasavalidtypeforpropertyvalues.Sothechoiceofstorin
g nicknamesasaListoraseparatevertextypeisuptothedesigner.
Summary

Rulebasedschematransformationsaretoolsthatadatamodeldesignercanusetorew
ritea
graphschema,withoutlosinganyinformationintheprocess.Inotherwords,adatamo
del
designercanusetheserulestoselectthedirectionsofedges,thenamesofdifferentlabe
ls
andkeys,thelocationsofvariousproperties,andsoon.Thesechangesdon'tmatterfro
man
pureinformationperspective,butcouldmakeabigdifferenceintheusabilityandeffic
iency.
Inthatsense,adatamodeldesignercangobacktoCodd'soriginalgoalsfornormalizat
ion designingschemasthatareeasytomodify,easytoextend,informativetousersand
supportiveofvariousquerypatterns.

Chapter5

One metare fornormalization

Theprevioussectionlistedsevenrulebasedschematransformationssuchasrenamin
glabels,
reversingedges,promotingedgesandpropertiestovertices,andsoon.Suchrulebase
d transformationscanbemechanicallyappliedto
anygraphschema,withoutlosingany
informationintheprocess.Usingtheserules,agraphdatabasedesignercanstartwith
a designgeneratedfromanentityrelationshipmodelandtweakittogetafinaldesign.

Thissectiondescribesasingle metare from whichthesevenpreviouslydescribed


rulescanbederived.Italsoformalizessomeoftheideaspresentedintheprevious
sectionsusingsettheory.
Schemasandconstraints

Fgure17:Exampegraphschema
Theabovefigureshowsanexamplegraphschemadescribingconstraintsonthegraph
data modelsuchas:

Whatarethelegallabelsforvertices?
Whatarethelegaledgelabelsbetweentwovertextypes?
Whatarethelegalpropertykeysandvaluetypesateachedgeorvertextype?

Thereality,however,isthatagraphmodelcouldhaveotherconstraintsthataren'texpr
essed
intheschema.Forexample,the'inviter'edgeineveryInvitationmustbetotheUserwh
ohas
an'owns'edgetothe'page'edgeoftheInvitation.Thisconstraintisn'tcapturedintheab
ove schema.

Thequestionis:Howcanwemodelcompexconstrntsnagraphmode?
Graphuniverses,transformationsandequivalence
Agraphunverse
Uisasetofgraphs,typicallyaninfiniteset.Agraphuniverserepresentsa
datamodelinthesensethatitcaptureseveryvalidgraphthatbelongstothedatamodel.

AgraphuniverseUis compabe withagraphschemaS,ifeverygraphintheuniverseis


compatiblewithS.Inotherwords,althoughthegraphuniverseisaprecisedescriptio
nofthe
model,itcanstillbeunderstoodasarefinementofamorelooselydefinedgraphschem
a.

Redenngequvenceusngtransformaonfuncons

Fgure18:Annverbefuncon
AgraphtransformationTisafunctionthattakesgraphsfromoneuniverseUto
anotheruniverseV.Inshort,T:U→ V.

AuniverseUisequivalenttoauniverseVifthereisatransformationfunctionT:U →
V,
whereTisinvertible.Invertibleandbijectivearetermstocharacterizefunctionsthat
establishaonetoonecorrespondencebetweentwosets,whichinthiscasearegraph
universes.

Inotherwords,givenanygraphG∈U,wecanuseT(G)togetagraphG'∈V.Thenwecan
usethe inversefunctionT1
(G')togetbackG.Henceestablishingequivalenceofthetwouniverses.
Aprogrammngperspecve

Ifweareupgradingfrom onegraphmodeltoanother,thetransformationfunctionisthe
upgradescp thatwewouldimplementtomovetothenewmodel.Ifwecanalsowritea
downgradescp
,thenwehavetwoequivalentmodels(oruniverses).Inotherwords,two
graphmodels,representedasuniversesorschemas,areequivalentiftheyareforwar
dand backwardcompabe .

Derivedtypes

ConsideragraphuniverseUthatiscompatiblewithaschemaS.AvertextypeinScanb
ecalleda devedvertextypenU
,ifeverygraphG∈Uissuchthatitsvertices(andadjacentedges)belongingto
thevertextypecanbecalculatedfromtherestofthegraph.

Inotherwords,givenanygraphintheuniverseU,afterweremoveallverticescorresp
ondingtothe
derivedvertextype,thereshouldbeawaytocalculatethoseverticesagain.Derivede
dgeand
propertytypescanbedefinedsimilarly.Notethatallderivedelementtypesaredefine
dingraph
schemas,butarespecifictographuniversesthatarecompatiblewiththatschema.

Metarule:Addingandremovingderivedtypes
Finally,hereisthemetarulebehindallschematransformations:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanaddaderivedvertex/edge/propertytypetoproduce
an equivalentgraphuniverseVcompatiblewiththeschemaS∪{derivedtype}.

Thereverserulestatesthat:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanremoveaderived
vertex/edge/propertytypetoproduceanequivalentgraphuniverseVcompatible
withtheschemaS{derivedtype}.
Fgure19:Modfedgraphschema

The'invitee'edgetypeinthegraphschemashowninthefirstfigureisaderivededgetyp
e. Thisisbecausethe'invitee'edgescanbecalculatedbygoingfrom
theInvitationverticesto
thePageandbacktotheuserthrough'owns'edge(reversedirection).Wecansimplifyt
he
originalschematotheversionshownintheadjoiningfigurebyapplyingtheserules:

(Metarule)Removederivededgetype'invitee'
(Edgepromotion)DemotethebinaryrelationshipInvitationtoanedgecalled'invited
'.
Asyoucansee,theupdatedschemaissimplerthantheoriginalschemaderivedfrom
anER diagram.
Provingthemetarule

Themetaruleiseasyto provebecauseofthewayderivedtypesaredefined.The
transformationfunctiontoremoveaderivedtypesimplyremovesallelementsthatbel
ong tothattype.Theinversefunctioncalculatesthederivedtypesfrom
theremaininggraph.
Hencetheuniversewiththederivedtypeisequivalenttotheuniversewithoutit.

Provingthe7rules:Renaming,Reversing,PropertyDisplacement,...
Considertheexampleofrenaminganedgetype.Thisrulewasstatedinthelastsectiona
s:
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
edgelabelbetweentheoutandinvertextypes.
Wecanprovethisintwosteps:
Addderivededgetypewiththenewnameasacopyoftheoldedgetype.
Removetheoldedgetypewhichisnowderivablefromthenewedgetype.

Ofcourse,step1requiresthattheedgetypewiththenewnamedoesn'talreadyexistinth
e
schema.Otherwise,alledgesoftheedgetypecan'tbederived.Hencethecondition"a
slong asthenewnamedoesn'trefertoanexistingedgelabel."

Inthismanner,wecanproveeachrulebyperformingsomestepstofirstaddnewderive
d typesandthenremovetheexistingtypeswhichbecomederivedtypesthemselves.
Beyondtransformationrules

Thinkingintermsofgraphuniverses,derivedtypesandtransformationfunctionslets
usdo
moreradicaltransformationstoourgraphmodel.Themannerwithwhichweapplyth
eserules ortransformationsdependsonouroverallstrategyfordatamodeling.

Onestrategyistominimizethenumberofimplicitconstraintsnotcapturedbythesche
ma.For
instance,theschemashowninthesecondfiguredoesn'thavetheimplicitconstrainton
the'invitee'
edgetypeshowninthefirstfigure.Generally,fewerimplicitconstraintsmeanslessdu
plicationof
dataandlesschanceofbugswhileupdatingthedatabase.Thisissimilartonormalizati
onin relationaldatabases.

Adifferentstrategyistotunethegraphforitsspecificqueryingneeds.Suchapproache
shave
beenpopularizedby"denormalization"techniquessuchasdimensionalmodeling.F
orinstance, wecouldadda"shortcut"derivededgetypecalled'latest'from
UsertoPagetoshowthelast
createdpageforeachuser.Theimportantthingthenistoensurethatanychangetotheres
tof
thegraphisaccuratelyreflectedinthederivedelementtypes.Thecodethatoperateso
nthe graphmustbedesignedwiththeseconstraintsinmind.

Summary
Thissectionintroducedsettheoreticrepresentationsofgraphmodelscalledgraphun
iverses,
whicharemorepowerfulthangraphschemas.Secondly,thissectionshowedthattwo
graph
universesareequivalentifthereisaninvertiblegraphtransformationfunctionbetwe
enthem.
Finally,thissectionshowedthatallschematransformationrulespresentedintheearli
er
sectioncanbederivedfromonemetarulethatdealswithaddingandremovingderived
types.

Validatinggraphschemas

Thelastfew sectionshavediscussedhow
propertygraphschemascanhelpdesigngraph databasesfrom
ERmodelsandrefinethedatamodelthroughschemamanipulations.After
readingthisthreadontheGremlinusersgroup,werealizedthatitiseasytovalidategra
phs againstschemaswithGremlinandGroovy.

Fgure20:Tnkergraphschema
ThisgistonGithubshowshowyoucantakeaninstancegraphandchecktoseeifitiscom
patible
withaschemagraph.Theschemagraphhasverticesandedgescorrespondingtoverte
xandedge
types.Here'sthecodetocreateaschemagraphinsideaGremlinshellfortheclassicTi
nkerpop schemashownhere:

sg=newTinkerGraph()
person=sg.addVertex()
person.setProperty('_label','person')
person.setProperty('name','java.lang.String')
person.setProperty('age','java.lang.Integer')
software=sg.addVertex()
software.setProperty('_label','software')
software.setProperty('name','java.lang.String')
software.setProperty('lang','java.lang.String')
knows=person.addEdge('knows',person)
knows.setProperty('weight','java.lang.Float')
created=person.addEdge('created',software)
created.setProperty('weight','java.lang.Float')
created.setProperty('_minIn',1)//Someonemustcreatethesoftware

ThepropertieshavevaluescorrespondingtotheJavaClassofthepropertyvaluesinth
e
instancegraph.Thepropertykeyscanendwith'?'toindicatethepropertyisoptional.T
he
edgesintheschemagraphcanhave4specialproperties,viz._minIn,_maxIn,_minOu
tand _maxOuttoindicatecardinalityrestrictionsforvariousedgetypes.

Anyinstancegraph,g,canbevalidatedagainsttheschemastoredinsg,usingtheGreml
in script:
g.V.filter({checkVertex(it,sg)})
YoucanlookatthefullGithubgisttoseehowthevalidationisdone.
ThecurrentversionofTinkerpopdoesn'tsupportvertexlabels.Sothemappingfrom
the vertextothevertextypeisspecifictothegraph,likethis:
vertexType={v,sg>.age?
sg.V('_label','person').next():sg.V('_label','software').next()}
Mostgraphschemastypicallyhaveapropertynamed'type'thatwouldmakethismapp
ing easier.
HoweverwithTinkerpop3,thismethodcanbestandardizedtousethelabel:
vertexType={v,sg>sg.V('label',v.label).next()}

Pixy:Firstorderlogicongraphdatabases

TheprevioussectionshaveshownthatanyERmodelcanbeconvertedtoapropertygr
aph
schema,andthattheschemacanbenormalizedusingrules.However,onekeyquestio
n remains:

Dographdatabasesofferthesamequerngcapalitesasraonaldatabases?
Inotherwords,anydatathatfitsinanERmodelcanbestuffedintoagraphdatabase.But
candatastuffedinthisfashionbequeriedeffectively?
Thisisthesubjectofthissection.
Background
OnSQL

SQListhequerystandardforrelationaldatabases.Itfirstappearedinthe1970sandw
as
standardizedinthe80sand90s.ThetheoreticalfoundationofSQLisrelationalalgebr
a.Codd
showedthatrelationalalgebraisequivalenttorelationalcalculus,aformoffirstorde
rlogic.His theoremisthebedrockofSQL'sexpressivepower.

Onfirstorderlogic

Usingrelationalalgebra,wecanwriteanyqueryoftheform"Findallrowsfromtables
A,B,C,..., matching somepredcat
",aslongasthepredicatecanbeexpressedinfirstorderlogic.
Specifically,thepredicateisformedusing:

variouscomparisonsonrowsandcolumns, logicaloperations"and"(∧),"or"
(∨)and"not"(¬),and
theuniversal"forevery"(∀)andexistential"thereexists"quantifiers(∃)thatop erateonrowsofagiventable.

Let'sconsidertablesnamedperson,carandticket.Wecouldexpressaquerylike"find
me peoplewhoownonlyBMW
cars,buthaveatleastonespeedingticket".Thepredicatecanbe writtenas:
my_query(person)=
(∀car,personownscar∧car.make='BMW')∧(∃ticket,personhasticket)

OnGremlin
Gremlinisastandardgraphtraversallanguage.ItispartoftheTinkerpopstackandwo
rks
acrossallBlueprintscompatibledatabases.YoucanreadmoreaboutGremlinhere:

GremlinWikionGithub
GremlinDocs
ThePathologicalGremlin(presentation)

Gremlinisgreatforstepbasedqueries.Fore.g.,somethinglike"findthefriendofafrie
ndof
vertexv"canbewrittenasv.out('friend').out('friend').Thisstyleoftraversalwithver
ticesand edgesisn'tnaturalinSQLwithtuples.

ThedeclarativequeryingstyleofSQLis,however,differentfrom
Gremlin.TheSQL2Gremlin
tutorialgoesthroughsomeexamples.Butyoucanseethatthetranslationisn'tobvious.
Pixy:FirstorderlogicwithGremlin

Pixyisabridgefrom
firstorderlogictoGremlin.ThefirstorderlogicofPixyoperateson
verticesandedges.Wecanaskquestionslike"Findverticesandedgesthatmatchsom
e precat "wherethepredicateisformedby

variouscomparisonsonvertexandedgeproperties, logicaloperations"and"
(∧),"or"(∨)and"not"(¬),and
theuniversal"forevery"
(∀)andexistential"thereexists"quantifiers(∃)thatoperateon verticesandedges.
PixyqueriesareexpressedusingPrologrules,notSQL.RulesinPrologareexpressed
asHorn clauses.
ProloglikeSQLhasthefullexpressivepoweroffirstorderlogic.
Let'stakethepredicatefromtheearlierdiscussion,
my_query(person)=
(∀car,personownscar∧car.make='BMW')∧(∃ticket,personhasticket)
Let'ssaythatwerepresentpeopleasverticeswithoutgoingedgetypesnamed'car'and
'ticket'
toverticesrepresentingcarsandtickets.Now,wecouldexpresstheabovepredicateu
sing Hornclausesasfollows:

my_query(Person,Ticket):out(Person,'ticket',Ticket),
not(not_all_bmw(Person)).
not_all_bmw(Person):out(Person,'car',Car),
property(Car,'make',Make),
Make<>'BMW'.
NotethatoutandpropertyarepredefinedpredicatesinPixy.Youcanseethatthe
∃partofthequeryiseasy.Thisisamatteroffindinga
ticket.The∀partofthequeryisimplementedusingtwonots.Inotherwords,saying"everycarisaBMW"isthesamea
ssaying"thereisno carthatisn'taBMW".

ERmodelsinPixy

IfyouuseanERmodelasastartingpointforyourdesign,youcanreconstitutetheERmo
del from
thefinalgraphschemausingPixy.ConsiderthepreviouslyreferencedERmodelwith
entitiesnamedUser,PageandTagandrelationshipsnamedOwns,InvitesandTagged
As.
Fgure21:ERmodelforUserPageTagappcaon
Thiswastranslatedtoagraphschemawithfourtypesofvertices,viz.User,Page,Taga
nd Invitation.

Fgure
2:GraphschemaforUserPageTagappcaon
Now,wecanreconstitutetheERmodelfrom
thegraphschemausingPixywiththefollowing clauses:

%Entities
user(User,Name,Login):property(User,'name',Name),property(User,'login',Log
in). page(Page,Uri,Html,CreateTs):property(Page,'uri',Uri),...
tag(Tag,Hashtag,Description):property(Tag,'hashtag',Hashtag),...
%Relationships
owns(User,Page):out(User,'owns',Page).
taggedAs(Page,Tag):out(Page,'taggedas',Tag).

invites(Invitation,Inviter,Invitee,Page):
out(Invitation,'invitee',Invitee),
out(Invitation,'inviter',Inviter),
out(Invitation,'page',Page).

Everypredicatecorrespondstoanentityorarelationship.Thepredicateoperateson
vertices,
edgesandpropertiesthatbelongtothegraphschema.Now,yougetthefullpowerof
firstorderlogicontheERmodel.Inotherwords,anyfirstorderpredicatethatappliest
o entitiesandrelationshipscanbewrittenasaPixyquerythatusestheaboveclauses.

Let'stakeanexamplepredicatethatmatchesallusersinvitedtopagestagged'tinkerpo
p' createdin2014.Youcouldexpressthisasfollows:
tinkerpop_invitee(User,Page):invites(_,_,User,Page),
page(Page,_,_,CreateTs),
CreateTs>1388534400L,%Unixtimestampfor1/1/2014
taggedAs(Page,Tag),
tag(Tag,'tinkerpop').
Notethat'_'isusedtorepresentanonymousvariables.
Queryrequirementsdon'tusuallymatterwhilemodeling

Itisn'tsurprisingthatqueriesinfirstorderlogiccanbecompiledtoGremlin,sinceGre
mlinis
Turingcomplete.ThesurprisingthingisthatPixyconvertsanyfirstorderlogicqueryo
nan
ERmodeltosomethingthatexecutes"efficiently"onthecorrespondinggraphdataba
se.

By"efficiently",wemeanthatthePixy/Gremlinquerywillalwaystraverseedgestog
ofrom
oneentity/relationshiptoanother.Edgetraversaloperationsingraphdatabasesare
typicallyordersofmagnitudefasterthanindexbasedjoinsinrelationaldatabases.
Queriesonproperties,willofcourse,needindexesforefficientquerying.Butaslong
asyour
startingERmodelisaccurate,yourapplicationwillnothavetosimulatejoinsusingth
ese
propertyindexes.Inthatsense,thegraphschemadesignisindependentofthequery
requirements.

PartII
Chapter8:IntroductiontoDatabase

DatabaseSystemsevolution:Databasesanddatabasetechnologyarevitalto
modernorganizationssupportingboththedailyoperationsanddecisionmaking.
Databasetechnologyhasundergoneremarkableevolutionover50years.Despite
dominancetotheenterpriseDBMSmarketplacebyOracle,theindustryremains
highlycompetitivewithacontinuedhighlevelofinnovation[12].
Figure1:Evolutionofdatabasetechnology
Majorperiodsofdatabasetechnologyevolution[12]:

1stGeneration(1960’s):Fileoriented–Supportedsequentialandrandom
searchingoffiles,buttheuserwasrequiredtowritecomputerprogramsto
accessdata.Thedatabasesoftwareindustryhadlittleornostandards
duringthisperiod.

2ndGeneration(1970’s):Navigational–Couldmanagemultipleentitytypes
andrelationships.Computerprogram stillhastobewritten.Progresson standards.
3rdGeneration(1980’s):Relationalwithnon-proceduralaccess–Foundation
based on mathematical relations and associated operators. Optimization
technology was developed.IBM performed pioneering researchtoenablecom-
mercializationofrelationaldatabasetechnology.

4thGeneration(1990’s+):Objectoriented–Areextendingthebound-aries
ofdatabasetechnology.New kindsofdistributedprocessinganddata
warehouseprocessing.Canstoreandmanipulateunconventionaldata
types.ConvenientwaystopublishstaticanddynamicWebdata.
DBMSmarketplace:DespitedominancetotheenterpriseDBMSmarketplaceby
Oracle,withmorethan40% overallmarketshare,theindustryremainshighly
competitivewithacontinuedhighlevelofinnovation.Insomeenvironments,its
competitionisMicrosoftSQLServer,IBM DB2,Teradata,SAP Sybase.Open
sourceDBMSproductshavebeguntochallengethecommercialDBMSproducts
atthelow-endoftheenterpriseDBMSmarketplace.Thecategoryofopen-source
DBMSisleadedbyMySQL,followedbyMongoDB,PostgreSQLandMariaDB.Int
he desktopDBMSmarket,MicrosoftAccessdominatesbecauseofthedominanceof
MicrosoftOffice.

Figure2:DBMSmarketplace
Innovationintheindustry:TheadvancesinDBMSinrecentyearssupportbusiness
intelligenceprocessingfordataintegrationandusageofsummarydata.NoSQL
technologyhasbeendevelopedtosupporttheneedsofBigData,tobemodernwebsca
le databases.Since 2009,the mostaccepted definition ofNoSQL isnext
generationdatabasesbeingnon-relational,distributed,open-sourceandhorizon-
tally scalable.Othercharacteristicsthatusuallyapplyareschema-
free,scalability,global
availability,easyreplicationsupport,simpleAPI,eventuallyconsistent/BASE(not
ACID),andlargescaledata.[5][19]

TypesofDBMS

RankingInthissectionweobserverankingscreatedbyDB-Engines.DB-Enginesis
aninitiativethatprovidesinformationonthepopularityoftheDBMSavailablein
themarket.TheymakeavailabledifferentrankingsforeveryDBMStype,whichare
updatedmonthly.
Figure3:DBMSdevelopedbydatabasemodelpiechart

Overthoselines,apiechartrepresentsthecategoriesofDBMSthatcomprisemore
systemsdeveloped.ThedatabasemodelmoreelaborateistheRelationalDBMS,wh
ere 137systemsfallunderthiscategory.ItisfollowedbyKey-
valuestores,with63systems,
Documentstores,with43systems,andGraphDBMS,with27systems.
Intheoverallclassificationofdatabasemodels,thoseDBMStypesaredistinguished
. TypesofDBMS:
RelationalDBMS GraphDBMS
Key-valuestores TimeSeriesDBMS
Documentstores RDFstores
ObjectorientedDBMS(Atkinson) NativeXMLDBMS
Searchengines Contentstores
MultivalueDBMS EventStores
Widecolumnstores NavigationalDBMS

Abovetheselines,the14moredevelopeddatabasemodelshavebeenlisted.If
insteadofcountingthesystemsdeveloped,thedatabasemodelsarerankedbypop
-ularity,thelistofmodelstobeconsideredshrinks.Mostoftheusersworkon
relationalDBMS,the79.5%,followedbydocumentstores,7.3%,searchengines,
4.3%,key-valuestores,3.5%,widecolumnstores,3.1%,andgraphDBMS,1.1%.
Belowtheselinesapiechartrepresentsthemostrecentpopularityrank.
Figure4:DBMSpopularitybydatabasemodelpiechart

Inthepiechartabove,itiscleartoseethatRelationalDBMSaretheonesusedby
default.However,thestateoftheartischangingbytheinnovationsinthe
databasetechnology.Even thoughthepercentagesofpopularityofNoSQL
databasesareminimalcomparedtoRelationalDBMS,thefactthattheyarerecent
technologiesingrowthisenoughtoevaluatethemmoredeeply.
NoSQLDBMS

ManydifferentNoSQL DBMS have been developed,buttheyare generally


classifiedinfourtypes[5]:
Key-valuestores:itsstructureconsistsinpairingkeystovalues.When
performingachangeinavalue,theentirevalueotherthanthekeymustbe
updated.Itscaleswellbecauseofthesimplicity.However,itcanlimitthe
complexityofthequeriesandotheradvancedfeatures.[18]Examples:
Dynamo,AzureTableStorage,BerkeleyDB

DocumentStores:Therecordsstoredarecalleddocuments,whichconsist
ofgroupingofkey-valuepairs.Valuescanbenestedtoarbitrarydepths.
[18]Examples:Elastic,MongoDB,AzureDocumentDB

WideColumnStores:WhileRDBMSstoreallthedatainaparticulartable’s
rowstogetheron-
disk,beingabletoretrieveaparticularrowfast,Columnfamilydatabasesareabletor
etrievealargeamountofaspecificat-tribute
fastbyserializingallthevaluesofaparticularcolumntogetheron-disk. This
approach is useful for aggregate queries. [18] Examples:
Hadoop/HBase,Cassandra,AmazonSimpleDB

GraphDatabases:idealatdealingwithinterconnecteddata.Theirstruc-ture
consistofconnections,oredges,betweennodes.Bothnodesandtheiredges
canstoreadditionalpropertiessuchaskey-valuepairs.Thestrengthofa
graphdatabaseisintraversingtheconnectionsbetweenthenodes.Their
downsideisthattheygenerallyrequirealldatatofitononemachine,limiting
theirscalability.[18]Examples:Neo4J,InfiniteGraph,TITAN

Othertypes:MultimodelDatabases,ObjectDatabases,Grid & Cloud Database


Solutions, XML Databases, Multidimensional Databases,
MultivalueDatabases,EventSources,TimeSeries/StreamingDatabases
(a)ExampleofKey-ValueStore (b)ExampleofDocumentStore

Figure5:FourmaintypesofNoSQLdatabases Consistency Models forNoSQL


databases:Before NoSQL,ACID was the
quintessentialmodelthatdatabasesweremeanttofollow.Briefreminderofthe
ACIDproperties:

Atomicity:Alloperationsinatransactionsucceedoreveryoperationis rolledback.
Consistent:On the completion ofa transaction,the database is
structurallysound.

Isolated:Transactionsdonotcontendwithoneanother.Contentiousaccesstodatais
moderatedbythedatabasesothattransactionsappearto runsequentially.
Durable:Theresultsofapplyingatransactionarepermanent,eveninthe
presenceoffailures.

However,NoSQLdatabasesbreakwiththetopicalityofSQLmodelswithACID
properties.BASEpropertiesseem toadequatebettertomostNoSQLdatabases,
andtheyareasfollows:

BasicAvailability:hedatabaseappearstoworkmostofthetime.
Soft-state:Storesdon’thaveto bewrite-consistent,nordodifferent
replicashavetobemutuallyconsistentallthetime.
Eventualconsistency:Storesexhibitconsistencyatsomelaterpoint(e.g.,
lazilyatreadtime).

ACIDtransactionscanbeconsideredstricterthanneededformanyNoSQLcases,
astheyapplymanyconstraintsforsafetysake.Ontheotherhand,BASE
transactionsguaranteesscale and resilience.The BASE modelisused by
aggregatestores,suchascolumnfamily,key-valueanddocumentstores.In
contrast,graph databases use the ACID model.BASE databases promise
availabilityofthedataattheexpenseofdataconsistency(theconsistencyofthe
dataisonlyassuredatconcretesnapshots).[16]Graphdatabasesdifferentiate
themselvesfrom otherNoSQLdatabasesbyfocusingmoreondataconsistency.
Thecomparisonmadeinthelinesaboveisshowninatablebelow:

ACID

Properties Atomicity Consistent Isolated Durable BASE


BasicAvailability Soft-state
Eventualconsistency

NoSQLDBMS GraphDatabases Aggregatestores


Table1:ComparisonofACIDandBASEConsistencyModels
ComparisonofDBMS

RelationalDBMSclearlyarethebenchmarkamongdatabasesystems.Themass
adoptionofthisDBMStypeisanimportantfactorforchoosingitasthemainsystem
inmanycompanies.However,currenttrendsshowthatthefourmaintimesofNoSQL
databasesshouldalsobetakenintoaccountbeforeinstallingaDBMS.Tohavea
moreobjectivepointofviewofthebenefitsofusingeachmodel,theusecasesfor
whichtheyperform betterandtheonesforwhichtheyperform theworst,are
listedbelow.

Usecasesforrelationaldatabases

Positiveusecases:transaction-orienteddatabases(bankingapplications, on-
linereservations),wheretheconcurrencyofmanytransactionsmust besup-
portedandtheintegrityofthedatamustbeprotected.

Negativeusecases:datawarehouses,whichareanalytically-oriented
databaseswithalargeamountofdataandinfrequentupdates.The
constraintsoftherelationaldatabasewouldn’tsupportthescalability.

Usecasesforkey-valuestores
Positiveusecases:
–Forstoringusersessiondata
–Maintainingschema-lessuserprofiles
–Storinguserpreferences
–Storingshoppingcartdata
Negativeusecases:
–Toquerythedatabasebyspecificdatavalue
–Withrelationshipsbetweendatavalues
–Tooperateonmultipleuniquekeys
–Ifthebusinessneedsupdatingapartofthevaluefrequently
Usecasesfordocumentstores
Positiveusecases:
–E-commerceplatforms
–Contentmanagementsystems
–Analyticsplatforms
–Bloggingplatforms
Negativeusecases:
–Toruncomplexsearchqueries
–Applicationrequirescomplexmultipleoperationtransactions
Usecasesforwide-columnstores
Positiveusecases:
–Contentmanagementsystems
–Bloggingplatforms
–Systemsthatmaintaincounters
–Servicesthathaveexpiringusage
–Systemsthatrequireheavywriterequests(likelogaggregators)
Negativeusecases:
–Tousecomplexquerying
–Ifthequerypatternschangefrequently
–Withoutanestablisheddatabaserequirement
Usecasesforgraphdatabases[19]
Positiveusecases:
–Frauddetection
–Graphbasedsearch
–NetworkandIToperations
–Socialnetworks
Negativeusecases:
–DataWarehousessobigthatrequireBASEmodel
Figure6:PositionsofNoSQLdatabases(source:Neo4j)

Onthefigureabove,thefivetypesofDBMSthatwerebeingcompared,aredisplayeda
ccordingtothesizeandcomplexityoftheirdatabases.Itcanbe
concludedthateachoneofthoseDBMSworksforsomespecificusecases,
dependingontheamountandcomplexityofthedatathatisgoingtobestored.
Theirusecasesarenotoverlapped,whichjustifiesthatthefifthofthem must
beconsideredbeforeimplementingaDBMSinacompany.

Chapter9:GraphDatabases

Graphdatabasesaredatabaseswhosespecificpurposeisthestorageofgraphoriente
ddatastructures,thereforeanintroductiontographtheorytobeconsistentwhenusingi
tsterminology.

ConceptsofGraphDatabases

PositioningIthaspreviouslybeenexplainedthatNoSQLdatabasesaddresssev-eral
issuesthatrelationaldatabasesdonot:availabilityfortheprocessingoflarge
datasets,partitioning,flexibilityoftheschemaandmodellingandprocessingcomple
xstructuresliketrees,graphs,specializedinprocessinghighlyconnecteddata,
managingcomplexandflexi-bledatamodelsandimprovingtheperformanceof
complexqueriesbytraversingthegraph.

ModelAnotherqualityofgraphdatabasesisthesimplicityofitsmodel.Inthe
figuresbelow,itcanbeappreciatedthedifferenceinmodelingthesameusecase
inarelationaldatabaseoragraphdatabase.Themodelofthegraphdatabaseis
moresimilartothebusinessmodel,whichmakesitmoreaccessibletonottechnicalpr
ofiles.[8]
(a)RelationalDatabaseModel (b)GraphDatabaseModel
Figure7:ModelComparison

Agraphisapictorialrepresentationofobjectswhichareconnectedbysome
pairoflinks.Agraphcontainstwoelements:Nodes(vertices)and
relationships(edges).

WhatisGraphdatabase
Agraphdatabaseisadatabasewhichisusedtomodelthedataintheform
ofgraph.Itstoreanykindofdatausing:

Nodes
Relationships
Properties
Nodes:Nodesaretherecords/dataingraphdatabases.Dataisstoredasproperties
andpropertiesaresimplename/valuepairs.
NodescanbegroupedtogetherbyapplyingaLabeltoeachmember.Anodecan
havezeroormorelabels.Labelsdonothaveanyproperties.StoringdatainNeo4jis
similartoaddmorerecordsinotherdatabases.

Relationships:Itisusedtoconnectnodes.Itspecifieshowthenodesarerelated.

Relationshipsalwayshavedirection. Relationshipsalwayshaveatype.
Relationshipsformpatternsofdata.

Properties:Propertiesarenameddatavalues.
PopularGraphDatabases
Neo4jisthemostpopularGraphDatabase.OtherGraphDatabasesare

OracleNoSQLDatabase OrientDB
HypherGraphDB
GraphBase
InfiniteGraph
AllegroGraphetc.
WhyGraphDB

Graphdatabaseisveryusefulnowadaybecauseingraphdatabasesdataexistin
theformoftherelationshipbetweendifferentobjects.Therelationshipbetweenthe
dataismorevaluablethanthedataitself.

Relationaldatabasesstorehighlystructureddatawhichhaveseveralrecordsstoring
thesametypeofdatasotheycanbeusedtostorestructureddataand,theydonot
storetherelationshipsbetweenthedatawhilegraphdatabasesstorerelationships
andconnectionsasfirst-classentities.

Thedatamodelforgraphdatabasesissimplecomparedtootherdatabasesand,
theycanbeusedwithOLTPsystems.Theyprovidefeaturesliketransactionalintegrit
yand operationalavailability.

GraphDBvsNoSQLDatabase
FollowingaresomepointswhichspecifywhyGraphDbisbetterthanotherNoSQLda
tabases:

MostNoSQLdatabasesstoresetsofdisconnectedaggregates.Thismakesit
difficulttousethemforconnecteddataandgraphs.
Onewell-knownstrategyforaddingrelationshipstosuchstoresistoembedan
aggregate'sidentifierinsidethefieldbelongingtoanotheraggregate-effectively
introducingforeignkeys.

Butthisrequiresjoiningaggregatesattheapplicationlevel,whichquicklybecomes
prohibitivelyexpensive.
Seetheusecasesofdifferenttypeofdatabases:
Relationaldatabase:Itisrepresentedintabularformsoitisbestforcalculatingthe
income.
Key-ValueStore:Itisbestforbuildingashoppingcart.
NoSQLdatabases:Itisstoredasadocumentso,itisbestforstoringstructured
productinformation.
GraphDB:Itfollowsagraphstructure.Itisbestfordescribinghowausergotfrom
pointAtopointB.
Neo4jDataModel
Neo4jDatabasefollowsthePropertyGraphModelforstoringandmanagingitsdata.
Neo4jisagraph
databasewhichcontainsthefollowingfeaturesofPropertyGraphModel.

TheGraphmodelcontainsNodes,RelationshipsandPropertieswhichspecifiesdat
aand itsoperation.
Propertiesarekey-valuepairs.
NodesarerepresentedusingcircleandRelationshipsarerepresentedusingarrowke
ys. Relationshipspecifiestherelationbetweentwonodes.
Therearetwotypesofrelationshipsbetweennodesaccordingtotheirdirections:
UnidirectionalandBidirectional
EachRelationshipcontainstwonodes:"StartNode"or"FromNode"and"ToNode"
or "EndNode".
BothNodesandRelationshipscontainproperties.
RelationshipsshouldbedirectionalinPropertyGraphDataMode.Ifyoucreatea
relationshipwithoutadirection,itwillthroughanerrormessage.

TherearethreemainbuildingblockofaGraphDBDatamodel:

Nodes
Relationship Properties
FollowingisasimpleexampleofaPropertyGraph.

Figure8:SimpleGraph

Here,wehaverepresentedNodesusingCircles.Relationshipsarerepresentedusin
gArrows.
Relationshipsaredirectional.WecanrepresentNode'sdataintermsofProperties(k
ey-valuepairs).In
thisexample,wehaverepresentedeachNode'sIdpropertywithintheNode'sCircle.

Queryperformance

GraphdatabasescompetitiveadvantageIthasbeensaidthatgraphdatabaseshavea
reasontobebecausetheyoutperform relationaldatabasesincomplexqueries.They
areparticularlygoodwhentherelationshipsbetweenitemsaresignificant.Theuse
casethatisbettersuited forgraph databasesis"find allentitiesofa kind"
(myEntity.findAll).Theexecutionofsuchaquery,startswithanindexlookuptofind
thestartingnode(s)fortraversal.Thentherelationshipsinthegrapharetra-versed
simultaneously.Becauseoftheconcurrenceofthetraversal,thebiggerthevolumeof
data,themoreitoutperformsrelationaldatabases.
Figure9:Queryexecutioningraphdatabases

Relationaldatabasesarelessadequatetoquerythroughrelationships.Itwouldmean
queryingthroughdifferenttables,followingforeignkeysandotherindexes,anditwo
uld
considerablyincrementtheperformancetime.Graphdatabasestraversalsareperfo
rmed byfollowingphysicalpointers,whileforeignkeysarelogicalpointers.
[8]Thequeryinthe figure,includesthetimeofeachindex-
scan.Themoretablesareincludedinthequery,the
largertheexecutiontimewillbecome.
Figure10:Queryexecutioninrelationaldatabases

RelationalDatabasescompetitiveadvantageOntheotherhand,becauseofthe
internalstructureofthetables,relationaldatabaseswouldoutperform graph
databaseswhentheoutputrequiresalltheattributesofatable(findAll-like
queries).Itsidealusecaseistoaggregateoveracompletedataset.[8]

GraphdatabasesrankingBelow thoselines,thefigureshowstheDB-Engines
RankingonGraphDBMS.Neo4jleadstheranking,anditsscoretriplesthe
followingDBMS,MicrosoftAzureCosmosDB.Neo4jhasbeenleadingtheGraph
databasessectorforsomeyears,aswecanseeinthetrendscatterplot.Itmust
betakenintoaccountthatthescoreisdisplayedinlogarithmicscale,thereforethe
differenceinpopularityisreallysignificant.

ItcanalsobeseeninthetrendscatterplotthatMicrosoftAzureCosmosDBappearedin
thegraphdatabaselandscapein2014,andsincethenitsrisein
popularityhasbeenquitesteep.AnargumentforthatisthatMicrosoftAzureis
wellintegratedinthesoftwaremarketplace.
Successfactor:Ithasbeenstated,whencomparingtheNoSQLDBMS,thatgraph
databaseshadalimitationinsize.Therefore,itisacompetitiveadvantagetowork
onfacilitatethepartitioningofagraph.WhileOrientDBandInfiniteGraphstatethat
theyaccomplishedso,Neo4jseemstobetheDBMSthatmoresuccessfullyis
improvinggraphpartitioning.[8]

Figure11:GraphDBMSRanking
Figure12:TrendGraphDBMSpopularityscatterplot

Chapter10:Neo4j
NecessityofNeo4j

WhyNeo4j?ByusingagraphdatabaselikeNeo4jwhichfocusesondatarela-
tionships;
patternsandtrendscaneasilybeseenunliketorelationaldatabases.Duetotoday’s
growingbusinessdemandsandcompetitiveatmosphere,usingtherighttoolisvery
importantandwhenitcomestowidelyconnecteddataNeo4jisthebestbecauseitis
thousandsoftimesfasterthantraditionaldatabases.Neo4janalyzeandtraverseofall
datainrealtimeandgivestheresultsveryfast.Neo4jiswidelyusedbylotsofbig
companieslikeeBay,Walmart,Cisco,UBSandmanymore.

WhatisNeo4j?Neo4jisanopen-sourceNoSQLgraphdatabasewritteninJavaand
ScalaandAccordingtodb-engines.com,Neo4jiscurrentlyworld’slead-inggraph
database.Thishasmanyreason.FirstofallNeo4jprovidesACID transaction
compliance,clustersupport,runtimefailover,highavailabilityandhighspeedquery
ing throughtraversals.Itscalestobillionsofnodesandrelationship.Ithasgreatuser
interfaceanditiseasytolearnbecausetherearelotsoffreeonlineresourcesonthe
web.Alsoithasgreatcommunitythatcanhelpwithanyprob-lems.Ingeneralterms
Neo4jisdesignedforlinkingrelationshipsandithandlesthisrelationshipswithspee
d,
ease,andextremeflexibility.WithNeo4j,modelscaneasilybeconvertedtodatabase
schema.Ifthedataisdenselyconnectedorvariousconceptualmodeltry’sisneeded
forthedatathenNeo4jisthesolution..

Neo4jVersions

Version ReleaseDate Neo4jVersion1.0 February2010 Neo4jVersion2.0


December2013 Neo4jVersion3.0 April2016

Graphdatabasesusesarelationshipfirstapproachtostoringandqueryingyourdata.
Theystoredatainamuchmorelogicalfashion,awaythatrepresentstherealworld
and prioritizes the representation,discoverability and maintainability ofdata
relationships.Butdataintegrityisimportantformaydeveloperswhocareaboutdata
relationshipsoACIDpropertywasbroughtbacktoatleastonenosqldatabasecalled
Neo4J.ThisallowsusNeo4Jasatransactionaldatastore.Storingyourmostcritical
businessdata.

Graphdatabasesgivesdevelopersamoreintuitivedatamodelfasterqueriesand
betteragilitytoadapttochangesinthebusiness.
Figure13:Neo4jAsaLeadingGraphDatabase

HowNeo4jisDifferentThanTraditionalDatabases?Graphdatabasesaremuch
differentthantraditionalrelationaldatabaseslikeSQL.Insteadofusingtablewith
rowsandcolumns,graphdatabasesuseagraphwithnodesandrelationships.
Bothofthesetypesofdatabaseshavetheirplace.Relationaldatabaseisgreatfor
tabulardatathatisnotreallycloselyrelated.Ifwehavealotofnested
relationshipsinrelationaldatabaseitcangetverycomplicatedwithjointablesand
joinqueriesandweneedallkindsofprimaryandforeignkeysanditcanbereal
hardtodealwithandevenworsethanthatisitcanbereallycostlyonthesystem
sographdatabasesarebuilttofixthatproblem andworkwithdatathatismuch
morecloselyrelatedandmoredynamic.
Thus,becauseofthereasonsstatedabovewechooseNeo4jasourdatabase.
Figure14:Ebay’scommentaboutNeo4j
Neo4jWorking
Neo4jstoresanddisplaysdataintheformofgraph.InNeo4j,dataisrepresentedbyno
desand relationshipsbetweenthosenodes.

Neo4jdatabases(aswithanygraphdatabase)arealotdifferenttorelationaldatabase
ssuchasMS
Access,SQLServer,MySQL,etc.Relationaldatabasesusetables,rows,andcolumn
stostoredata. Theyalsopresentdatainatabularfashion.

Neo4jdoesn'tusetables,rows,orcolumnstostoreorpresentdata.
Neo4jisbestforstoringdatathathasmanyinterconnectingrelationshipsthat'swhygr
aphdatabases
likeNeo4jhasanadvantageandmuchbetteratdealingwithrelationaldatathanrelatio
naldatabases are.

Thegraphmodeldoesn'tusuallyrequireapredefinedschema.Sothereisnoneedtocr
eatethe
databasestructurebeforeyouloadthedata(likeyoudoinarelationaldatabase).InNe
o4j,thedatais thestructure.Neo4jisa"schema-optional"DBMS.
InNeo4j,noneedtosetupprimarykey/foreignkeyconstraintstopredeterminewhichf
ieldscanhave
arelationship,andtowhichdata.Youjusthavetodefinetherelationshipsbetweenthe
nodesyou need.

FeaturesofNeo4jGraphDatabase

SQLLikesimplequerydialectNeo4jCQL
It’sbackinguptheIndexesbyusingApacheLucence
ItcontainsaUItoexecuteCQLCommandsi.e,Neo4jDataBrowse
It’sbackinguptheUNIQUEconstraint
ItbolstersfullACIDproperties
ItutilizesNativegraphstockpilingwithNativeGPE
ItfollowsPropertyGraphDataModel
ItgivesRESTAPItobeexecutedforanyProgrammingLanguagelikeSpring,Java,
Scalaandsoforth
ItbolsterstradingofinquiryinformationtoJSONandXLSformat
AdvantagesofNeo4j
PropertiesofNeo4j

Figure15:GeneralLookatNeo4j
FollowingarepropertiesofNeo4j;
Datamodel(flexibleschema):Neo4jhaspropertygraphmodel.Itcanbe
explainedlikegraphhasnodesandthesenodesareconnectedwitheach
other.Nodesandtheirrelationshipsstoredatainkey-valuepairsknownas
properties.Neo4jhasalsoflexibleschemaitmeanspropertiescanbe
addedorremovedwhenitisnecessary.

ACIDproperties:Neo4jsupportsfullACID(Atomicity,Consistency,Isolation,and
Durability)rules.

Scalabilityandreliability:Databasecanbescaledbyincreasingthenumberofreads/
writes,andthevolumewithouteffectingthequeryprocessing
speedanddataintegrity.Neo4jalsoprovidessupportforreplicationfor
datasafetyandreliability.

Thetraversalofthegraph:Thetraversalistheoperationofvisitingasetof
nodesinthegraphbymovingbetweennodesconnectedwithrelationships.
It’sauniqueoperationtothegraphmodelfordataretrieval.Queryingthedata
usingatraversalonlytakesintoaccountthedatathat’srequired,thereforeit
isnotneededtoquerytheentiredatasetinanexpensiveoperation,likeisthe
casewithjoinoperationsonrelationaldata.[1]

CypherQueryLanguage:Neo4jprovidesapowerfuldeclarativequerylanguagekno
wnasCypher.ItusesASCII-artfordepictinggraphs.Cypheris
easytolearnandcanbeusedtocreateandretrieverelationsbetween
datawithoutusingthecomplexquerieslikeJoins.[9]

Built-inwebapplication:Neo4jprovidesabuilt-inNeo4jBrowserweb
application.Usingthis,creatingandqueryinggraphdatacanbedone.
Drivers:Neo4jcanworkwith
RESTAPItoworkwithprogramminglanguagessuchasJava,Spring, Scalaetc.
JavaScripttoworkwithUIMVCframeworkssuchasNodeJS.
ItsupportstwokindsofJavaAPI:CypherAPIandNativeJavaAPIto
developJavaapplications.
Indexing:Neo4jsupportsIndexesbyusingApacheLucence.
AdvantagesofNeo4jGraphDatabase

Neo4jisverypopularinlotsofindustriesanditisafirstchoiceofmanycompanies.Ne
o4jgivesadvantageinmanypoints.Firstofallitisbasedonhandling
complexdataconnectionsasaresultoftheincreasedvolumeandstrengthinthe
data,thesecompaniesgainlotsofbenefitsamongtheircompetitive.Following
aretheadvantagesofNeo4j.

Easytorepresentconnecteddata:Itmakesbotheasyandfasttotraverseor
navigatelargeamountsofdatathathassomesortofrelationship
Canrepresentsemi-structureddataeasily:Datathatdoesnotfallintonatural
structurecanbeeasilyrepresentedinagraphdatabase

CypherCommands:Cyphercommandsarehumanreadableandveryeasy
tolearnSimpleandPowerfulDataModel:Thepropertygraphdatamodelis
simpleyetstillverypowerful.Thebasicbuildingblocksareknowntorelationshipsa
ndtheycancontaindataintheform ofkeyvaluepairsor
propertiesunliketherelationalmodel.

JoinAspect:There’snoneedforcomplexandcostlyjoinstoretrieveconnectedorrel
ateddata.Insteadthegraphdatabaseusesanaturalconcept
ofrelationships.Relationshipsinagraphactuallyformedpathssoquerying
ortraversingagraphinvolvesfollowingthosepatsandbecauseofthat pathori-
entednatureofthegraphdatamodel,themajorityofpathbased
operationsareextremelyefficient.

Performance:Traversing a relationship is done in constanttime so query


performancedoesnotdecreasewhendatagrowsandCypherisdesignedfor
graphssoitisverysimpletowritegraphtraversalsbasedonpatternmatching.

Neo4jisonlygraphdatabasethatcombinesnativegraphstorage,scalable
architecture optimized forspeed,and ACID compliance to ensure
predictabilityofrelationship-basedqueries.[10]

Real-timeinsights:Neo4jprovidesresultsbasedonreal-timedata.
Highavailability:Neo4jishighlyavailableforlargeenterprisereal-time
applicationswithtransactionalguarantees.[15]
Biggestgraphcommunityintheworld:Neo4jhasthelargestandmost
contributorgraphcommunity.
Easytolearn:MatureUIwithintuitiveinteractionandbuilt-inlearning.[10]
PerformanceInNeo4j
Neo4jprovidesfastandefficientgraphexperienceandthestrongestpartofitis;Neo4j
cantraversemillionsofnodesinmilliseconds.Alsoevenexponentiallyincreas-
ingdata sizedoesnoteffecttheperformanceofNeo4junlikerelationaldatabases.

VolkerPacher,eBaydeveloperandNeo4jclient:"OurNeo4jsolutionisliterallya
thousandtimesfasterthanthepreviousMySQLsolution,withsearchesthat
requirebetween10and100timeslesscode”.

Figure16:QuerytimesforOracleExadatavsNeo4j
Figure17:Tomtom’sComparisonofNeo4jwithMySQL
HowToIncreasePerformanceOfNeo4j?
Increasingthesizeofavailableheapmemory(Between8G-16G).
Increasingopenfilelimitfromdefault1024toatleast40000tobesure.
Inordertoavoidcostlydiskaccess,makingsureofrelevantgraphdatais
cachedinmemory.
Forthenon-Neo4jtasksrunningonthecomputerasufficientmemory
shouldbereserved.(Atleast16G)
Simplealgorithmsleadstoincreasedperformance.
Allrelatednodesandedgesshouldbekeptinservermemorybeforegiving results.
Traversalsshouldbeindependent.
Indexesshouldbeused.
WhatcanNeo4jbeusedfor?

Neo4jis highly suitable forstoring data thathas has many


interconnectingrelationships.Thisiswheregraphdatabasescan
makeahugedifference.Infact,graphdatabaseslikeNeo4jaremuch
betteratdealingwithrelationaldatathanrelationaldatabasesare.
Thisisinpart,duetothefactthatthegraphmodeldoesn'tusually
requireapredefinedschema.Youdon'tneedtocreatethedatabase structurebefore
you load the data (like you do in a relational database).InNeo4j,thedatais
thestructure.Neo4jisa"schemaoptional"DBMS.

ButthemainreasonNeo4jisbetterforrelationaldataisinthewayit
allowsyoutocreaterelationships.Neo4jisbuiltaroundrelationships.
Thereisnoneedtosetupprimarykey/foreignkeyconstraintsto
predeterminewhichfieldscanhavearelationship,andtowhichdata.
WithNeo4j,justaddanyrelationshipbetweenanynodewheneveryou need.

SothismakesNeo4jextremelywellsuitedforsocialnetworking
applicationslikeFacebook,Twitter,etc.Buttherearemanyother
areaswhereNeo4jexcels.Herearesomeofthemainareasthat Neo4jcanbeusedfor:

● Socialnetworks
●Realtimeproductrecommendations
●Networkdiagrams
●Frauddetection
●Accessmanagement
●Graphbasedsearchofdigitalassets
●Masterdatamanagement

CypherQueryLanguage

Cypherisadeclarativelanguageforworkingwithgraphsandgraphdataforboth
readingandwritingtothegraphanditisveryexpressiveandpowerful.Also
Cypherdefinespatternsinthegivengraphdata.

Cypherisdeclarativelanguage:Thismeansthatwespecifythedatathatweare
interestedin.Wedonotspecifyhowtogetthatdatafromthedatabase.
Cypherisveryhumanreadablelanguageanditisaccessiblenotjustfordevelopersev
eryonecaneasilylearnanduseit.

CypherhasexpressionssimilartoSQLlikeWHERE,ORDER BY andsimple
conditionstatementslike<,=,>.Itsdifferencewithsqlis;Cypherisdesignedto
representgraphdatapatternsforexampleithasMATCHpropertythispropertyis
builtonfindingandspecifyingpatternsinthedata
Structure
NodesNodesrepresentsdataentitiesandtheycanhavelabelsandeachnode
representsdifferentsingledataentities.Itisequivalenttorecordsinarela-tional
databaseNodescanalsohavepropertieswhicharebasicallyattributes.Nodesare
shownwithparentheseslike(p:Product).

Figure18:NodeRepresentation

RelationshipsInCypher;betweenthenodeswehavelineswhichrepresentthe
relationshipbetweeneachnode.Relationshipscanalsohavepropertiesjustlike
nodeswhichissomethingthatismuchdifferentthanSQL.Alsorelationships
havedirections.Relationshipisshownas–>betweentwonodes.

OperationsInCypher
Create:Itisusedtocreatenodesandrelationshipsbetweenthem
Wecreatedanoderepresentinguswithfiveproperties;
Name:’AjitSingh’
Country:’India’
City:’Patna’

DateOfBirth:’21.05.1984’ School:’PWC’ WiththisCyphercode;

CREATE(n:Person{name:’AjitSingh’,country:’India’,city:’Patna’,
DateOfBirth:’21.05.1984’,School:’PWC’})RE-TURNn
Name:’AnnaTuruPi’
Country:’Spain’
City:’Barcelona’

DateOfBirth:’30.07.1995’ School:’PWC’ WiththisCyphercode;

CREATE (n:Person{name:’AnnaTuruPi’,country:’Spain’,city:
’Barcelona’,DateOfBirth:’30.07.1995’,School:’PWC’})RETURNn
Wecreatedarelationshipcalled"FRIENDS_WITH"withtheproperty"SINCE";
WiththisCyphercode;

MATCH(a:Person),(b:Person)WHEREa.name=’AjitSingh’ANDb.name=
’AnnaTuruPi’CREATE(a)-[r:FRIENDS_WITH{SINCE:"17/09/2017"}]->(b)
RETURNr
(a)ResultinConsole (b)AfterCreatingRelationship
Figure19:CreateRelationshipBetweenTwoNodes
Match:Matchfindsspecifiedpatternsinthedata.
Figure20:Relationships
WiththisCyphercodeweshowedallpeoplewhomEstebanZimányiteachesto;
MATCH(a:Person)<-[:TEACHES_TO]-(b:Person{name:’Este-
banZimányi’}) RETURNa.name
Set:Thisisusedtoupdatepropertiesinthenodesandrelationships.
WiththisCypherCodewechangedEstebanZimányi’sdateofbirthto’01.01.1966’
MATCH(n{name:’EstebanZimányi’})SETn.DateOfBirth=’01.01.1966’
RETURNn
DeleteThisoperatordeletesnodesorrelationshipsinthedata.
WiththisCyphercodewedeletedAjitSingh
MATCH(n:Person{name:’AjitSingh’})DELETEn
LoadingDataWithCypher

TherearelotsofwaystoimportdatainNeo4jbutthemostcommonwayisuploadit
asacsvfile.LoadCSVoperatorisbuiltintoNeo4jandthisoperatorisusedforsmall
ormediumsizedatasetsupto10millionrecords.Ifwewanttouploaddatathathas
morethan10millionrecordsthanweshoulduse[USING PERIODICCOMMIT[n]]
property.Ifwedontusethispropertythismeansthatweareprocessingwholefilein
onerunandcreatingeverythinginonetransaction

LoadCSV:ThisoperatorisusedforimportingCSVfilesintoNeo4j.
Figure21:LoadCSVOperatorStructure
UseCasesofNeo4j

Figure22:UseCasesOfNeo4j
Thecommonusecasesare;

RealTimeRecommendations:Recommendationalgorithmsfindsrelationships
betweenpeople,productsandotherservicesrelatedtopurposebasedonuser’s
previousbehaviors.Neo4jisabletostoreinterconnecteddataaboutcustomers
andproductsandsinceNeo4jdoesn’tneedindexingateverysuggestionit
providesveryfastandeffectivealgorithm todealwithrealtimedata.Walmart
usesNeo4jforthispurpose
MasterDataManagement:Inlargeorganizations,differentsystemsstoresinformati
onaboutcustomers,employees,titlesandsupplychain.Withthegraph
modelitiseasyto bring datafrom differentsystemscreateviewsabout
customersorcankeeptrackofalltheinformationabouttheorganizational
systemitself.CiscousesNeo4jforthispurposeandthecompanyalsousesNeo4j
fortheirhelpdeskso-lution
Figure23:MasterDataManagementGraphDesign

FraudDetection:Frauddetectionisveryimportantinfinanceindustry.Nowa-days
inordernottobedetectedbybank’sfraudalgorithmspeopleusedifferent
approacheslikeopenseveralbankaccountswithvalidinformationanddonormal
transactionswithoutbeinganoutlier.Sopeopleopenfalsebankaccountswiththe
sameidentitytokenandwithdrawallthemoneyinallbankaccounts.Itishardto
detectthatbehaviorbutitisveryeasytoseethatwithgraphbecausethepattern
ofthepeopleopeningbankaccountsusingthesameidentitytokencanbeeasily
detectedasapatterninagraph

GraphBasedSearch:Metadataisavailableforthingslikeproducts,articlesetc.
Andbeingabletomodelmetadataasagraphallowstoenhancesearchmeaning
usersareabletofindmorerelevantthingsforthem.ForexampleLinkedIn;When
searchisexecutedwedon’tseerandomoralphabeticalsortedresultswefirstsee
therelevantones.LufthansausesNeo4jforthismatter.

Network&ITOperations:Ifdatacenterismodelledasagraphthendepen-dency
analysiscaneasilybeappliedonnetworksystemstogetconclusionslikeifone
virtualmachinegoesdownhowmanyapplicationswillbeaffected.HpusesNeo4j
tomodeltheirnetworkforsomelargetelecommunicationproviders.

Figure24:NetworkITOperationsGraphDesign
Identity&AccessManagement:Withinlargeorganizationstherearehundreds
ofusersandcontrollingwhocanaccesstowhichinformationiscrucialfor
securityreasons.Socreatinggroupsandrolesforeachusercomesinhandyin
thissituation.Thiskindofdataisveryrichandconnectedandcanbeeasily
handledbyNeo4j.UPCLondonusesNeo4jforthatanditreceived2014Graphic
awardsfor“Bestİdentityandaccessmanagementapp”

Chapcrer11:GettingstartedwithNeo4j
Requirements
SpringDataNeo4j5.1.xatminimum,requires:
JDKVersion8andabove.
Neo4jGraphDatabase3.1andabove.
SpringFramework{springVersion}andabove.
IfyouplanonalteringtheversionoftheNeo4j-OGMmakesureitisa3.0.0+release.
DownloadNeo4j
FirstdownloadNeo4jfromitsofficialwebsite:https://neo4j.com/download/
YoucanchoosefromeitherafreeEnterpriseTrial,orthefreeCommunityEdition.Her
e,weareusing theCommunityEdition.
Runthedownloadedfileandfollowtheinstructionsgivenbelow:
StartNeo4j:
StarttheServer

ClickontheinstalledNeo4jCommunityEdition.
Initializationstarted:
Neo4jisstarted.Itisreadytouse
Openbrowserandgotolocalhost:http://localhost:7474/browser/
Orhttp://127.0.0.1:7474/browser/

StartNeo4jwebserver

Visitthesub-directory/binoftheextractedfolderandexecuteinterminal./neo4jstart
Visithttp://localhost:7474/

Onlythefirsttime,youwillhavetosigninwiththedefaultaccountandchangethe
defaultpassword.Asofcommunityversion3.0.3,thedefaultusernameand
passwordareneo4jandneo4j.

YoucannowinsertNeo4jqueriesintheconsoleprovidedinyourwebbrowserand
visuallyinvestigatetheresultsofeachquery.
StartNeo4jwebserver

EachNeo4jservercurrently(inthecommunityedition)canhostasingleNeo4j
database,soinordertosetupanewdatabase:
Visitsub-directory/bin andexecute./neo4jstop tostoptheserver

Visitthesub-directory /conf andeditthefileneo4j.conf ,changingthevalueofthe


parameterdbms.active_database tothenameofthenew databasethatyouwantto create.

Visitagainthesub-directory/binandexecute./neo4jstart
Thewebserverhasstartedagainwiththenewemptydatabase.Youcan
visitagainhttp://localhost:7474/toworkwiththenewdatabase.
Thecreateddatabaseislocatedinthesub-directory/data/databases ,underafolder
withthenamespecifiedintheparameterdbms.active_database .

Deleteoneofthedatabases

MakesuretheNeo4jserverisnotrunning;gotosub-directory/binandexecute
./neo4jstatus .Iftheoutputmessageshowsthattheserverisrunning,alsoexecute
./neo4jstop .
Thengotosub-directory/data/databasesanddeletethefolderofthedatabase
youwanttoremove.
CypherQueryLanguage

ThisistheCypher,Neo4j'squerylanguage.Inmanyways,CypherissimilartoSQL
ifyouarefamiliarwithit,exceptSQLreferstoitemsstoredinatablewhile
Cypherreferstoitemsstoredinagraph.

First,weshouldstartoutbylearninghow to createagraphandadd
relationships,sincethatisessentiallywhatNeo4jisallabout.
CREATE(ab:Object{age:30,destination:"England",weight:99})
YouuseCREATEtocreatedata
Toindicateanode,youuseparenthesis:()

Theab:Objectpartcanbebrokendownasfollows:avariable'ab'and
label'Object'forthenewnode.Notethatthevariablecanbeanything,but
youhavetobeconsistentinalineofCypherQuery
Toaddpropertiestothenode,usebrackets:{}brackets
Next,wewilllearnaboutfindingMATCHes
MATCH(abc:Object)WHEREabc.destination="England"RETURNabc;

MATCHspecifiesthatyouwanttosearchforacertainnode/relationship
pattern(abc:Object)referstoonenodePattern(withlabelObject)which
storethematchesinthevariableabc.Youcanthinkofthisentireline
asthefollowing

abc= findthematchesthatisanObjectWHEREthedestinationisEngland.

Inthiscase,WHEREaddsaconstraintwhichisthatthedestinationmustbe
England.YoumustincludeareturnattheendforallMATCHqueries(neo4jwill
notacceptjustaMatch...yourquerymustalwaysreturnsomevalue[thisalso
dependsonwhattypeofqueryyouarewriting...wewilltalkmoreaboutthislater
asweintroducetheothertypesofqueriesyoucanmake].

Thenextlinewillbeexplainedinthefuture,afterwegooversomemore
elementsoftheCypherQueryLanguage.Thisistogiveyouatasteofwhatwe
candowiththislanguage!Below,youwillfindanexamplewhichgetsthecastof
movieswhosetitlestartswith'T'

MATCH(actor:Person)-[:ACTED_IN]->(movie:Movie)
WHEREmovie.titleSTARTSWITH"T"
RETURNmovie.titleAStitle,collect(actor.name)AScast
ORDERBYtitleASCLIMIT10;
AcompletelistofcommandsandtheirsyntaxcanbefoundattheofficialNeo4j
CypherReferenceCardhere.
RDBMSVsGraphDatabase RDBMS GraphDatabase
Table Graph Rows Nodes
ColumnsandData Propertiesanditsvalues
Constraints Relationships Joins Traversal

Cypher
Introduction
CypheristhequerylanguageusedbyNeo4j.YouuseCyphertoperformtasks
andmatchesagainstaNeo4jGraph.
Cypheris"inspiredbySQL"andisdesignedtobyintuitiveinthewayyoudescribe
therelationships,i.e.typicallythedrawingofthepatternwilllooksimilartothe
Cypherrepresentationofthepattern.

Examples
Creation

Createanode
CREATE(neo:Company)//createnodewithlabel'Company'
CREATE(neo:Company{name:'Neo4j',hq:'SanMateo'})//createnodewithproperties

Createarelationship
CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)

QueryTemplates
Runningneo4jlocally,inthebrowserGUI(default:http://localhost:7474/browser/
), youcanrunthefollowingcommandtogetapaletteofqueries.
:playquerytemplate
Thishelpsyougetstartedcreatingandmergingnodesandrelationshipsbytyping
queries.
CreateanEdge CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)

Deleteallnodes
MATCH(n)
DETACHDELETEn
DETACH doesn'tworkinolderversions(lessthen2.3),forpreviousversionsuse
MATCH(n)
OPTIONALMATCH(n)-[r]-() DELETEn,r

Deleteallnodesofaspecificlabel
MATCH(n:Book)
DELETEn
Match(capturegroup)andlinkmatchednodes
Match(node_name:node_type{}),(node_name_two:node_type_two{})
CREATE(node_name)-[::edge_name{}]->(node_name_two)
UpdateaNode
MATCH(n)
WHEREn.some_attribute="someidentifier"
SETn.other_attribute="anewvalue"
DeleteAllOrphanNodes
Orphannodes/verticesarethoselackingallrelationships/edges.
MATCH(n)
WHERENOT(n)--()
DELETEn ReadCypheronline:neo4j/topic/3669/cypher

Python&Neo4j
Examples

Installneo4jrestclient
pipinstallneo4jrestclient
Connecttoneo4j
fromneo4jrestclient.clientimportGraphDatabase
db=GraphDatabase("http://localhost:7474",username="neo4j",password="mypass")
Createsomenodeswithlabels
user=db.labels.create("User")
u1=db.nodes.create(name="user1")
user.add(u1)
u2=db.nodes.create(name="user2")
user.add(u2)

Youcanassociatealabelwithmany nodesinonego
Language=db.labels.create("Language")
b1=db.nodes.create(name="C++")
b2=db.nodes.create(name="Python")
beer.add(b1,b2)
Createrelationships
u1.relationships.create("likes",b1)
u1.relationships.create("likes",b2) u2.relationships.create("likes",b1)

Bi-directionalrelationships
u1.relationships.create("friends",u2)
Matchusingneo4jrestclient
fromneo4jrestclientimportclient
q='MATCH(u:User)-[r:likes]->(m:language)WHEREu.name="Marco"RETURNu,type(r),m'

"db"asdefinedabove
results=db.query(q,returns=(client.Node,str,client.Node))

Printresults
forrinresults:
print("(%s)-[%s]->(%s)"%(r[0]["name"],r[1],r[2]["name"]))

Output:
(Marco)-[likes]->(C++) (Marco)-[likes]->(Python)

Chapter12:Neo4jApplication
SoftwareForthegraphdatabase,Neo4jCommunityEdition3.2.5hasbeenused,
andfortherelationaldatabase,SQLServer2017.
UseCaseSelected

Asproposedingraphdatabasebenchmarkguidelines[4],thebestteststo
benchmarkagraphdatabaseare:traversal(whichincludesthecalculationofthe
shortestpath),graphanalysis,connectedcomponents,communities,centrality
measures,patternmatchingandgraphanonymisation.Itisalsocommentedthat
amongthedomainswheregraphdatabasesprovetobemorebeneficialarethe
shortestpathgraphanalysisandrealtimeanalysisoftrafficnetworks.Inour
implementation,wearegoingtomodelflightroutes,astheyhavetheideal
propertiestobenchmarkagraphdatabase.Airportsandairlinesareelements
wheretheinformationliesonthetheirintercommunications.

Data

Thedatasetselectedtoperform thebenchmarkwasadatasetofflightroutes pro-


vided by OpenFlights.org [13].Itprovided three flatfiles,airlines.dat,
airports.dat,routes.dat.

Becauseofthesizeconcernswecreatedsyntheticdatainadditiontoourexistingdata
tables.Beforecreatingnewdatawehad67663differentroutesandnowwehave1193
413
differentroutes.Therowswecreatedhavedummyvariables,theydonothaveany
connectionwiththeexistingdataexcepttheirtypes.Soourqueriesmostlyresultedin
initialdataresults.Thisdatacreationprocesswasappliedbecausethemoredatawe
have,themoreaccuratebench-
markingresultsweget.Alsounliketraditionaldatabases,
addingmoredatatoNeo4jdoesnoteffectitsperformance.
ImplementingData

Figure25:OpenFlights.org
Neo4j:TocreatetheNeo4jdatabasewedevelopedapythoncode.Thiscodeuses
py2neolibrarytoaccessNeo4jdatabaseanditreadsourdata(externalsource)to
createnodes,relationships,propertiesandindexes
Figure26:Structureofthepythoncode

Theoriginalairportdatahadlatitudeandlongitudeattributes.Inordertopresent
bettervisualizationwecreatedafunctionthatcalculatesthedistancebetweentwo
connectedairports.Routedatahassource_airportanddestination_airportSowe
createdaroutenodeandweassignedthedistancebetweensource_airportand
destination_airportasanameattributetoroutenode.Intheendfourtypesofnodes
areAirlines,AirportsandRoutes,andtheyhavethefollowingcommunications:

Route ! TO ! Airport
Route ! FROM ! Airport
Route ! OF ! Airline

Table2:Graphdatabaseschema WeimplementedourdatatoNeo4jwiththisschema;

Figure27:InitialSchema
Figure28:ExampleofaqueryinNeo4j
SQL:Arelationaldatabasewascreatedimportingeachflatfileasatableandthen
wecreatedforeignkeyreferencesbetweentables.
Exportdata

ToexporttheNeo4j,wechosetousetheapoclibrary.Itisneededtoauthorize
Neo4jtoruntheplugins.Forthat,thislineofcodehastobeaddedinneo4j.conf:
apoc.export.file.enabled=true.

ExporttoCSV

apoc.export.csv.query(query,file,config): exports results from the Cypher


statementasCSV totheprovidedfileapoc.export.csv.all(file,config):exports
whole database as CSV to the pro-vided file
apoc.export.csv.data(nodes,rels,file,config):exportsgivennodesandrelationshi
ps
asCSVtotheprovidedfileapoc.export.csv.graph(graph,file,config):exportsgiven
graphobjectasCSVtotheprovidedfile
Weexportedtheentiredatabaseexecutingthefollowingcommandincypher: CALL
apoc.export.csv.all("/temp/neo4j_database_csv_file.csv",
{batchSize:10})YIELDfile,source,format,nodes,relationships,properties,
time,rows.

Exporttocypherscript

apoc.export.cypher.all(file,config):exportswholedatabaseincl.indexesasCyphe
r statements to the provided file
apoc.export.cypher.data(nodes,rels,file,config):
exportsgivennodesandrelationshipsincl.indexesasCypherstatementstothe
providedfileapoc.export.cypher.graph(graph,file,config)exportsgivengraphobj
ect incl.

Uindexes as Cypher statements to the provided file


apoc.export.cypher.query(query,file,config):exportsnodesandrelationshipsfro
mthe Cypher statement incl. indexes as Cypher statements to the provided file
apoc.export.cypher.schema(file,config):exportsallschemaindexesandconstraint
s tocypher

Thedatabasewasalsoexportedtocypheracypherscript:

CALL apoc.export.cypher.all("/temp/neo4j_database_cypher_file.cypher",
{batchSize:10})YIELDfile,source,format,nodes,relationships,properties,time,
rows
Figure29:ExportingNeo4jdatabasetocypherscript
QueryExamples(Neo4j-SQL)
Figure30:Algorithmsforgraphdatabases

Addlibraries:IthasbeencommentedthatNeo4jincludesgraphalgorithmsthat
allow ustoperform queriesthatwouldbeimpossibletoperform inSQL.
LibrariesofalgorithmscanbedownloadedandaddedinNeo4jasplugins.
Figure31:Addjarfilesinpluginfolder

ItisneededtoauthorizeNeo4jtoruntheplugins.Forthat,thislineofcodehas
tobeaddedinneo4j.conf:dbms.security.procedures.unrestricted=apoc.*(e.g.,
apoclibrary).

Afterthat,Neo4jneedstoberestarted,anditcanbeverifiedthatthepluginis
workingbywritingthefollowingcommandinNeo4jbrowser:
CALLdbms.procedures()YIELDname,signature,description
WHEREnamestartswith"apoc"
RETURNname,signature,description
ShortestPath

Thisalgorithmistheonethatbetterjustifiestheexistenceofgraphdatabases.
ItscalculationisimpossiblewithSQL.InSQLitisneededtospecifythenumber
oflayerstheroutehas.
Firstqueryexample:findtheshortestpathtogofromanairportinMadridtoan
airportinSeoul.
MATCHp=shortestpath((src:Airportcity: ’Madrid’)-[r:FROM|TO*..15]
(dest:Airportcity: ’Seoul’))RETURNp

Figure32:ShortestpathqueryfromMadridtoSeoul
Figure33:Pipelineoftheshortestpathquery
Thenodescanbeexpanded,andweseetheairlinetowhicheachroutebelongs.
Figure34:Expandedshortestpathquery
Secondqueryexample:findtheshortestpathbetweenanairportinSeouland
anairportinAntwerp.
MATCHp=shortestpath((src:Airport{city: ’Seoul’})-[r:FROM|TO*..15]
(dest:Airport{city: ’Antwerp’}))RETURNp
Figure35:ShortestpathqueryfromSeoultoAntwerp

Payingattentiontotherelationships,itcanbeseenthatthequerydoesn’toutputa
physicallypossibletravelingroutefrom theorigincitytotheorigincity.Inthefirst
query,oneofthepathsendsupinSeoul,buttheotherhastwosources,Madridand
Seoul,andtheybothendupinBeijing.Thesecondqueryhasthreeoriginairports,
oneinAntwerpandtwoinSeoul,andalltheroutesfinishinGeneve.

Thepurposeofthealgorithmistofindtheshortestpathtoconnecttwonodes,
independentlyofthephysicalmeaning,butrealroutescanbecreatedwiththe
followingmodification:

Persistentinferredrelationships:Foreachroutegoingfrom
anairporttoanother,arelationshipconnectingbothairportshasbeenadded.Thisway
,the shortestpathquerycanlookforonlyonetypeofrelationship.Iftheobjectiveis
tofindphys-icallypossiblepathsbetweentwoairports(e.g.,notsteppinginto
anairline)itwillbeassuredlookingforthatinferredrelationshipthatairports
arebeingconnectedtoairports.

RelationshipCONNECTED.Thisrelationshiphasthepropertyweight,andispropo
rtionaltothenumberofroutesbetweentwoairports.Itisbeingusedinthe
shortestpathqueriesandcommunitydetectionqueries.

Cyphercodetocreatetherelationship:
MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,COUNT(*)ASweight
CREATE(ap1)-[c:CONNECTED]->(ap2)
SETc.weight=weightInthefigurebelowthedatabaseschemaafteraddingthe
inferredrelationshipisdisplayed:
Figure36:Neo4jDBschemaafteraddingConnectedrelationships
Cyphercodetodeletetherelationship:
MATCH(ap1:Airport)-[r:CONNECTED]->(ap2:Airport)DELETEr

RelationshipGOINGTO.Thisrelationshipsavestherouteandairlineinformation
initsproperties.Itisbeingusedintheshortestpathqueriesandcommunity detec-
tionqueries.
Cyphercodetocreatetherelationship:

MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,r
MATCH(r)-[:OF]->(al:Airline)
CREATE(ap1)-[g:GOINGTO]->(ap2)
SETg.distance=r.distance
SETg.route=id(r)
SETg.airline=al.name
Inthefigurebelowthedatabaseschemaafteraddingtheinferredrelationshipis
displayed:
Figure37:Neo4jDBschemaafteraddingGoingtorelationships
Cyphercodetodeletetherelationship:
MATCH(Airport)-[r:GOINGTO]->(Airport)DELETEr
Thefirstshortestpathqueryisrunagainnowwiththeinferredrelationships:
MATCHp=shortestpath((src:Airport{city: ’Madrid’})-[r:GOINGTO]
(dest:Airport{city: ’Seoul’}))RETURNp

Figure38:ShortestpathbetweenMadridandSeoul

Nowtheairportsaredirectlyconnectedtoeachother.Theroutenodecannotbe
seen,butitsidentifierissavedasoneoftherelationshipproperties.Withthe
followingqueryitcanbeverifiediftheroutematchestherequisites:

MATCH(r:Route)WHEREid(r)=50276RETURNr
ItisverifiedthattherelationshipGOINGTOwasequivalenttoarealoutbound
routebetweenMadridandSeoul.Thereturnroutisalsoverified:
MATCH(r:Route)WHEREid(r)=50205RETURNr
Figure39:Shortestpathreturnrouteoutput

ShortestpathinSQLServer:SQLServerhasthelimitationthatitneedtobe
specifiedthenumberoflayersinthepath.Analternativeistousearecursive
query,butfromourexperience,itwasnoteffective.

Whenexecutingthequery,weobtainthefollowingmessage:"Thestatementterminat
ed.Themaximumrecursion100exhaustedbeforestatementcompletion."
Figure40:PipelineofNeo4jqueryonAntwerp-Patnashortestpath
Betweennesscentrality:

Thebetweennesscentralityofanodeinanetworkisthenumberofshortestpaths
betweentwoothermembersinthenetworkonwhichagivennodeappears. Between-
nesscentalityisanimportantmetricbecauseitcanbeusedtoidentify
“brokersofinformation”inthenetworkornodesthatconnectdisparateclusters.[6]

Thisqueryshowstheairportsthathavetobecrossedmoreoftenbyroutesto gofrom
oneairporttoanother.Inotherworlds,theairportswheremore
transferstakeplace.Asitisdisplayed inthefigurebelow,theairports
highlightedarelikebottlenecksthatconnectclustersofairports.

MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL apoc.algo.betweenness([’CONNECTED’],airports,’OUTGOING’)
YIELDnode,score
SETnode.betweenness=score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT25
Figure41:Betweennesscentralityqueryresult

Thequeryoutputsfivebigairports,whicharecommonlyusedtotransferduring
intercontinentaljourneys.Itmakessensethattheyhavethehighestbetwenness
centrality.

Closenesscentrality:

Closenesscentralityistheinverseoftheaveragedistancetoallothercharactersin
thenetwork.Nodeswithhighclosenesscentalityareoftenhighlyconnectedwithin
clustersinthegraph,butnotnecessarilyhighlyconnectedoutsideofthecluster.[6]

Thisqueryoutputstheairportsthathavemoreconnectionstodifferentairports.In
otherwords,itshowsthelocationsthataremoregeographicallyisolatedtobe
reachedbyothermeansoftransport(e.g.islands).Itcanoutputtheairportswith
moredirectflightsfromdifferentlocationsortheairlinesthatperformmoreroutes.

Figure42:Conceptofclosenesscentrality
Queryexample:outputthefiveairportswithahigherclosenesscentrality:
MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL apoc.algo.closeness([’CONNECTED’], airports, ’OUTGOING’)
YIELDnode,score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT5
Figure43:Closenesscentralityqueryresult

Aspredicted,thequeryoutputsairportsthatareinhighlytouristicbutgeographicallyi
solatedlocations:LopezIslandnearSeattle,theriverAraguaiainthemiddle
ofBrazil,theGrandCanyonofColorado...
Figure44:Locationoftheairportswithhighestclosenesscentrality
Queryperformance:WritingPROFILEbeforethecypherquery,outputsthe
pipelineofthequeryexecution.

Figure45:Pipelineoftheclosenesscentralityquery
PageRank:
ThesecretofGoogle’ssuccesswasitssearchalgorithm,PageRank.PageRank
worksbycountingthenumberandqualityoflinkstoapagetodeterminearough
estimateofhowimportantthewebsiteis.Theunderlyingassumptionisthatmore
importantwebsitesarelikelytoreceivemorelinksfrom otherwebsites[11].This
algorithmcanoutputthemostconnectedairportorthemostpowerfulairline(the
nodeconnectedtomoreroutes).

Firstquery:outputthemostimportantairports
MATCH(ap:Airport)WITHcollect(ap)ASairports
CALLapoc.algo.pageRank(airports)YIELDnode,score
Figure46:Pipelineoftheairportspagerankquery
Themostimportantairportsarefrom London,Paris,Frankfurt,Patna,Dubai,
BeijingandtheUSA.Theoutputisnotsurprising.
Secondquery:Outputthemostpopularairlines.
MATCH(node:Airline)WITHcollect(node)ASairlines
CALLapoc.algo.pageRank(airlines)YIELDnode,score
Figure47:Pipelineoftheairlinespagerankquery
AsaresultwecanseethatRyanairistheleadingairline,followedbyfour
companiesfromtheUSAandthreefromChina.
CommunityDetection:

Therearemanyalgorithmsforcommunitydetection:trianglecounting,strongly
connectedcomponents,...Thisalgorithmsclustertogetherthenodesmore
relatedwitheachother.Wehavechosenanalgorithm from thelibraryAPOC,
andwhatthecodebelowdoes,isclassifytheairportnodesin40partitions.The
classificationisdeterminedontheweightoftheconnectedrelationships(the
numberofroutesbetweeneachpairofairports).

Seeingasairportsaregeographicallocation,androutsarephysicaljourneys
betweenthem,itisexpectedthatgeographicallyneighbouringairportswillbe
clusteredto-gether.Thathypothesisisverifiedbelow.

CALL apoc.algo.community(40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)
MATCH(ap:Airport)WHEREexists(ap.partition)RETURNap

Figure48:Communitydetectiongraph

Thefigureovertheselinesshowstheshapeofthegraphafterthenodeshave
beenclassifiedinpartitions.Toseewhichnodesbelongtoeachpartition,the
partitionnumbermustbereturnedasoutput:

CALLapoc.algo.community(40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)

MATCH(ap:Airport)WHEREexists(ap.partition)
RETURNap.partition,ap.country,COUNT(*)ASnum
ORDERBYap.partition,numDESC

Figure49:Communitydetectiontable

Goingbacktothevisualizationofthecommunitydetectionforairports,thepartitionsc
anberecognizedandverifiedbylookingatthetable.Theclusterofsix
nodesdisconnectedfrom therestofairportsiscomprisedofPapuaNewGuinea
airports(thecountrycanbeseenbyhoveringoverthenodes).Theybelongtothe
firstpartitioninthetable,6394.

Thefollowingpartofthegraphisabitscattered,butitcanbeseenthattheyareall
communicatedtothecentralnodes.Hoveringoverthem,weseethattheyallbelong
toCanada,andwecansupposethatthemoreseparatednodesareregionalairports
connectedtobiggermoreimportantairports.Thatpartofthegraphisequivalentto
sevenpartitionsinthetable.

NexttoCanada,agroupofnodesareseparated,andthoseairportsareallfrom
Algeria.Theymustbelongtopartition6624.

ThemorecentralizedpartofthissubgrapharetheairportsfromFinland.Someof
thoseareconnected withaGreenland’sairport,whichconnectswithother
GreenlandandIcelandairports.

ThenextsubgraphshowsairportsfromdifferentAfricancountriesinterconnected
witheachother.Ontheleftside,thereareairports,andairportsfrom african
countrieshighlyconnectedtothem,andontherightsidetherearemainlynigerian
airports,amongotherafricanaiportstoo.

Goingbacktothecenterofthegraph,itishardtorecognizemorethanonepartition,
asitshowsthecentraleuropeanairports,whicharehighlyinterconnected.
Atlast,apartitionwasdetectedinthetable,8355.Checkingifthoseairportsare
geographicallyrelated,ithasbeendeterminedthatthoseareislandsbetween
Polynesia,MicronesiaandMelanesia.that
(b)Geographicallocation
(a)Partitiontable
Figure50:Australasiapartition
PossiblequeriesonSQL
TheprevioussectionshowedoperationsthatcannotbedonewithSQL.Nowwe
willpresentoperationsapplicabletoboth;
Findingflightsbetweentwoairportsthathavenodirectroutebe-tweenthem:

MATCH
selectdistinctA1.Nameas
p=allShortestPaths((ap1:Airport [1stAirport]
{city:’Antwerp’})-[*]->(ap2:Airport ,airline1.nameas[1st
{city:’Patna’})) Airline],
WITHextract(nodein A2.Nameas[2ndAirport],
nodes(p)|node.name)as airline2.nameas[2nd
cities, Airline],
extract(relin A3.Name[3rdAirport],
relationships(p)|rel.airline)as airline3.name[3rdAirline],
airlines a4.name[4thAirport]
RETURNcities,airlines FROMroutesrINNERJOIN airportsa1
ONr.source_airport_id=a1.ID
ĢINNERJOINairlinesairline1
ONairline1.id=r.airline_id
INNERJOINairportsa2
ON
r.destination_airport_id=a2.ID
INNERJOINroutesr2
ona2.ID=r2.source_airport_id
INNERJOINairlinesairline2
onairline2.id=r2.airline_id
INNERJOINairportsa3
ON
r2.destination_airport_id=a3.ID
INNERJOINroutesr3
ona3.id=r3.source_airport_id
INNERJOINairlinesairline3
onairline3.id=r3.airline_id
INNERJOINairportsa4
on
a4.id=r3.destination_airport_id
WHEREa1.city=’Antwerp’and
a4.city=’Patna’
(a)Neo4jResult
(b)SQLResult Figure51:ComparisonofQueries-firstquery
Asitcanbeseenfromherefindingallpossibleroutesbetweentwoairportsiseasyin
Neo4j.BesidesthatNeo4jgivesvisualization.

Thereisoneimportantpointhere;InSQLwehavetospecifylevelofdepthtofind
results.Forexampleinthisquerywesearched3-levelflightsbetweenAntwerpand
Patna.Ifwesearched1or2levelthenthequerywouldhavereturnednoresult.Butin
Neo4jwedon’thavetospecifylevel,itfindsallroutesbetweentwoairportsandeven
calculatestheshortestroute.ThereforethisisoneofthedrawbacksofusingSQLin
datathathaslevels.
Nearestairporttocitybydistance

Select
match(airport1:Airport{city:’Bologna’} top1
)<-[:FROM]-(route:Route) A2.name,a2.city,a2.country
-[:TO]->(airport2:Airport) ,dbo.DistanceKM(a.latitude,a2. latitude,
RETURNairport1, A.longitude,A2.longitude)as
route,airport2 distance
ORDERBYroute.distance fromroutesr
asclimit1 INNERJOINairportsa
ona.id=r.source_airport_id
INNERJOINairportsa2
on
a2.id=r.destination_airport_id
WHEREA.city=’Bologna’
orderbydistanceasc

WhilewewereuploadingourdataintoNeo4jwecreatedanodecalledroute
andthisnodehasthreerelationships;TO,FROM,OFandasadescriptive
propertyweassignedcalculateddistancepropertyintoroutenode.Tobeinthe
samepagewecreatedafunctioninSQLthatcalculatesdistancesbetween
airportsgivenlat-itudeandlongitudeattributesofairportswhichalreadyexists
inourdata.BothapproachesgivethesameresultbutNeo4jalsoprovides
visualization.
Mostconnectedairports

MATCH SELECT
(airport:Airport)<-[:FROM]-(r:Route)
A.Name,A.City,A.Country,SUM(A.route_count)
WITHairport,count(r)as ASroute_count
departures FROM(
MATCH SELECT
(r2:Route)-[:TO]->(airport) a.Name,a.City,a.Country,
RETURNairport.nameas COUNT(*)asroute_countFROM
airport_name,departures routesR
,count(r2)asarrivals INNERJOINairportsAON
orderby A.ID=source_airport_id
departures+arrivalsdesc
GROUPBY
a.Name,a.City,a.Country
)
UNION(
SELECT
a.Name,a.City,a.Country,COUNT(*)
asroute_countFROM
routesR
INNERJOINairportsAON
A.ID=destination_airport_id
GROUPBY
a.Name,a.City,a.Country))A
GROUPBY A.Name,A.City,A.CountryORDER
(a)Neo4jQuery
(b)SQLquery
Figure52:Comparisonofqueries-thirdquery
Withthesequerieswefoundthemostinterconnectedairportbycountingnumberof
incomingandoutcomingflights.AsitseemsitisveryeasytowriteinNeo4j.

Bibliography
TareqAbedrabboDominicFoxJonasPartnerAleksaVukotic,NickiWatt.Neo4jin
Action.ManningPublications,2015.
StephanC.Carlson.Graphtheory.encyclopediabritannica.Availableathttps://ww
w. britannica.com/topic/graph-theory,May2013.Accessed:2017-11-30.

DB-
Engines.Knowledgebaseofrelationalandnosqldatabasemanagementsystems.
Availableathttps://db-engines.com/en/,2017.Accessed:2017-10-20. Martinez-
BazanN.Muntes-MuleroV.BaletaP.Larriba-PayJ.L.Dominguez-Sal,D.A
discussiononthedesignofgraphdatabasebenchmarks.September2010.

StefanEdlich.Nosqlarchive.Availableathttp://nosql-database.org/.
Accessed:2017-11-20.
Mathigon.Graphsandnetworks.Accessed:2017-11-30.
ThomasVialMichelDomenjoud.Graphdatabases:anoverview.OctoTalks,
July2012.Accessed:2017-11-30.
Neo4j.Introtocypher.
Neo4j.Toptenreasonsforchoosingneo4j.Availableathttps://neo4j.com/top-
tenreasons/.
Neo4j.Neo4jgraphalgorithms.Github,October2017.Accessed:2017-12-8.
University ofColorado.Database managementessentials.
Available at https://www.youtube.com/playlist?list=
PL73oFZbnYuixa9w-dL-EsM7Vy5BQGBIeO.Accessed:2017-10-21.
OpenFlights.org.Airport,airlineandroutedata.Availableathttps://
openflights.org/data.html.Accessed:2017-11-3.

TutorialsPoint.Graphtheory:Introduction.Availableathttps://www.tutorialspoin
t. com/graph_theory/graph_theory_introduction.htm.Accessed:2017-11-30.
JamesSerra.Relationaldatabasesvsnon-relationaldatabases.BigDataandData
Warehousing.JamesSerra’sBlog,August2015.Accessed:2017-11-29.

JamesSerra.Typesofnosqldatabases.BigDataandDataWarehousing.James
Serra’sBlog,April2015.Accessed:2017-11-29.
RoopendraVishwakarma.Thedifferenttypesofnosqldatabases.OpenSourceForU
, May2017.Accessed:2017-11-29.
S.Abiteboul.QueryingSemiStructuredData.InProc.ofthe6thInt.Conf.onDatabase
Theory(ICDT),volume1186ofLNCS,pages1–18.Springer,Jan1997.
S.AbiteboulandR.Hull.IFO:AFormalSemanticDatabaseModel.InProc.ofthe3th
Symposium onPrinciplesofDatabaseSystems(PODS),pages119–132.ACM
Press, 1984.

S.Abiteboul,D.Quass,J.McHugh,J.Widom,andJ.L.Wiener.TheLorelquerylangu
age
forsemistructureddata.InternationalJournalonDigitalLibraries(JODL),1(1):68–
88, 1997.

S.AbiteboulandV.Vianu.QueriesandComputationontheWeb.InProc.ofthe6thInt.
Conf.onDatabaseTheory(ICDT),volume1186ofLNCS,pages262–
275.Springer,Jan 1997.

R.AgrawalandH.V.Jagadish.EfficientSearchinVeryLargeDatabases.InProc.ofth
e14th Int.Conf.onVeryLargeDataBases(VLDB),pages407–
418.MorganKaufmann,AugSept 1988.

R.AgrawalandH.V.Jagadish.MaterializationandIncrementalUpdateofPathInfor
mation. InProc.ofthe5thInt.Conf.onDataEngineering(ICDE),pages374–
383.IEEEComputer Society,Feb1989.

R.AgrawalandH.V.Jagadish.AlgorithmsforSearchingMassiveGraphs.IEEE
TransactionsonKnowledgeandDataEngineering(TKDE),6(2):225–238,1994.
R.AlbertandA.L.Barabasi.Statisticalmechanicsofcomplexnetworks.Reviewsof
ModernPhysics,74:47,Jan2002.
N.Alechina,S.Demri,andM.deRijke.AModalPerspectiveonPathConstraints.Jou
rnalof LogicandComputation,13(6):939–956,2003.
B.AmannandM.Scholl.Gram:AGraphDataModelandQueryLanguage.InEuropea
n ConferenceonHypertextTechnology(ECHT),pages201–
211.ACM,NovDec1992.

M.AndriesandG.Engels.AHybridQueryLanguageforanExtendedEntityRelations
hip
Model.TechnicalReportTR9315,InstituteofAdvancedComputerScience,Univer
siteit Leiden,May1993.

M.Andries,M.Gemis,J.Paredaens,I.Thyssens,andJ.V.denBussche.Conceptsfor
GraphOrientedObjectManipulation.InProc.ofthe3rdInt.Conf.onExtendingDatab
ase Technology(EDBT),volume580ofLNCS,pages21–
38.Springer,March1992.

R.AnglesandC.Gutierrez.QueryingRDFDatafromaGraphDatabasePerspective.I
nProc.
2ndEuropeanSemanticWebConference(ESWC),number3532inLNCS,pages346
–360,
2005.

M.A.AufaurePortierand C.Tr´epied.A SurveyofQueryLanguages


forGeographic
InformationSystems.InProc.ofthe3rdInt.WorkshoponInterfacestoDatabases,pag
es 431–438,July1976.

M.AzmoodehandH.Du.GQL,AGraphicalQueryLanguageforSemanticDatabases
.In Proc.ofthe4th Int.Conf.on Scientificand StatisticalDatabaseManagement
(SSDBM),volume339ofLNCS,pages259–277.Springer,June1988.

C.Beeri.DataModelsandLanguagesforDatabases.InProc.ofthe2ndInt.Conf.on
DatabaseTheory(ICDT),volume326ofLNCS,pages19–
40.Springer,AugSept1988.
G.Benk¨o,C.Flamm,andP.F.Stadler.AGraphBasedToyModelofChemistry.Journ
al ofChemicalInformationandComputerSciences(JCISD),43(1):1085–
1093,Jan2003.
C.Berge.GraphsandHypergraphs.NorthHolland,Amsterdam,1973.
U.Brandes.NetworkAnalysis.Number3418inLNCS.SpringerVerlag,2005.

T.Bray,J.Paoli,andC.M.SperbergMcQueen.ExtensibleMarkupLanguage(XML)
1.0, W3C Recommendation 10 February 1998.
http://www.w3.org/TR/1998/RECxml19980210.

A.Broder,R.Kumar,F.Maghoul,P.Raghavan,S.Rajagopalan,R.Stata,A.Tomkins,
and J.Wiener.GraphstructureintheWeb.In
Proc.ofthe9thInt.WorldWideWebconferenceonComputernetworks:theinternatio
naljournalof computerandtelecommunicationsnetworking,pages309–
320.NorthHollandPublishingCo.,2000.
P.Buneman.SemistructuredData.InProc.ofthe16thSymposium onPrinciplesof
DatabaseSystems(PODS),pages117–121.ACMPress,May1997.

You might also like