Graph Database Modeling With Neo4j
Graph Database Modeling With Neo4j
CopyrightedMaterial
GraphDatabaseModelingwithneo4j
Copyright©2020-21byAjitSingh,AllRightsReserved.
Nopartofthispublicationmaybereproduced,storedinaretrievalsystem or
transmitted,inanyform orbyanymeans—electronic,mechanical,photocopying,
recordingorotherwise—
withoutpriorwrittenpermissionfromtheauthor,exceptfor
theinclusionofbriefquotationsinareview.
Forinformationaboutthistitleortoorderotherbooksand/orelectronicmedia,
contactthepublisher:
AjitSingh&AnantKumar
e:[email protected]
e:[email protected]
w:https://www.ajitvoice.wordpress.com
Preface
Thisbookisdesignedtowalkyouthroughthegraphdatamodeling.
Youwillbeintroducedtothe
basicprocessofdesigningagraphdatamodelthatcananswerawiderangeofbusiness
questionsacrossavarietyofdomains.
Graphdatamodelingistheprocessinwhichauserdescribesanarbitrarydomainasa
connectedgraphofnodesandrelationshipswithpropertiesandlabels.Agraphdatam
odelis designedtoanswerquestionsintheform
ofCypherqueriesandsolvebusinessandtechnical
problemsbyorganizingadatastructureforthegraphdatabase.
Thisbookissimplytheintroductiontodatamodelingusingasimple,straightforward
scenario.
Thereareplentyofopportunitiesthroughouttheupcomingguidestopracticemodelin
gdomains andanalyzingchangestothemodelthatmightneedtobemade.
Dataislikewater.It’sprobablyuselessifyoudon’tputitinahelpfulcontainer.Thesha
pe,size
andfunctionalityofthatcontainerdependsonyourintendeduse,butingeneral,aconta
ineris necessary.
Thesamegoeswithdata.Whenitcomestocreatinganewapplicationordatasolution,
you
needtoprovideastructureforthatdata.Thatstructuringprocessisknownasdatamod
eling.
Oftenreservedsolelyforseniordatabaseadministrators(DBAs)orprincipaldevel
opers,data
modelingissometimespresentedasanesotericartunknownabletomeremortals.Yo
umay worshiptheexpertdatamodelerfromafar.
Whilesomedatamodelingscenariosreallyarebestleftuptotheexperts,itdoesn’thav
etobe
difficultbydefault.Infact,datamodelingisasmuchabusinessconcernasatechnologi
calone. Soifyoudon’tknowasinglelineofcode,you’reinluck.
Anyonecandobasicdatamodeling,andwiththeadventofgraphdatabasetechnology,
matchingyourdatatoacoherentmodeliseasierthanever.Datamodelingisanabstract
ion
process.Youstartwithyourbusinessanduserneeds(i.e.,whatyouwantyourapplicat
iontodo).
Then,inthemodelingprocessyoumapthoseneedsintoastructureforstoringandorga
nizing yourdata.
Everydatamodelisunique,dependingontheusecaseandthetypesofquestionsthatus
ers needtoanswerwiththedata.Becauseofthis,thereisno“one-size-fits-
all”approachtodata
modeling.Usingbestpracticesandcarefulmodelingwillprovidethemostvaluabler
esultin producinganaccuratedatamodelthatbenefitsyourprocessesandusecase.
Thegraphdatabasesarenecessaryforaveryconcretedatasets:hugeamounts
ofdataofhigh
complexity,whereentitiesareveryrelatedtooneanother.Thatisbecause,they
efficientlyquerythroughtherelationshipsamongentities,incontrasttorelational
databases.
Graphdatabasessupportalgorithmstoperform concretequeriesthatareoutof
reachtorelationaldatabases,fortheirtabularstructureandstaticschema.Also,the
biggerthevolumeofdata,theslowerthequerieswouldbeinSQL,becausethey
would requireto lookup joined tableswith agreatnumberoftuples.Graph
databasesallow totraversethroughthegraphandreachahighlevelofdepth,
withouthavingtoreadallthedatastored.
Neo4jisanopensourceNoSQLgraphdatabase.Itisafullytransactionaldatabase
(ACID)thatstoresdatastructuredasgraphsconsistingofnodes,connectedby
relationships.Inspiredbythestructureoftherealworld,itallowsforhighquery
performanceoncomplexdata,whileremainingintuitiveandsimpleforthedeveloper
.
Neo4jis,byfar,theleadingtechnologyofgraphdatabases.Itanalyzeandtraverseof
alldatainrealtimeandgivestheresultsveryfast.Ithasgreatuserinterfaceand
support.Butthegreatestfeatureofitis;evendatasizegrow exponentially,
performanceofNeo4jdoesnotaffectedbyit.
Usingthisbook,you'llgetto learnthetheoryofgraphdatabaseandhowtouse
Neo4jtobuilduprecommendations,relationships,andcalculatetheshortestroute
betweentwolocations.Withexampledatamodels,bestpractices,use-cases,andan
applicationputtingeverythingtogether,thisbookwillgiveyoueverythingyouneedto
reallygetstartedwithNeo4j.Startingwithabriefintroductiontographtheory,this
bookwillshow youtheadvantagesofusinggraphdatabasesalongwithdata
modelingtechniquesforgraphdatabases.You'llgainpracticalhands-onexperience
withcommonlyusedandlesserknownfeaturesforupdatinggraphstorewith
Neo4j'sCypherquerylanguage.Thisbookincludesalotofbackgroundinformation,
helpsyougraspthefundamentalconceptsbehindthisradicalnewwayofdealing with
connected data,and willgiveyou lotsofexamplesofuse casesand
environmentswhereagraphdatabasewouldbeagreatinterest.
Neo4jisbeingusedbysocialmediaandecommerceindustrygiants.Youcantake
advantageofNeo4j'spowerfulfeaturesandbenefits-addBeginningNeo4jtoyour
librarytoday.
Contents
1.GraphDataModel
Graphdatabases
2.GraphSchemas
Selectingvertexlabels
Examplesoflabelselection
Drawingagraphschema
Summary
3.ConvertingERmodelstographschemas
ERmodelsanddiagrams
Example
ProceduretoconvertanERmodeltoagraphschema
Rule#1:Entitytypesbecomevertextypes
Rule#2:Binaryrelationshiptypesbecomeedgetypes
Rule#3:Naryrelationshiptypesbecomevertextypes
Conversionexample
Verticesarevertices,andedgesareedges
Summary
4.NormalizingGraphSchemas
Normalizationofrelationaldatabases
Transformationrulesthatproduceequivalentschemas
RuleA:Renamingpropertiesandlabels
RuleB:Reversingedgedirections
RuleC:Propertydisplacement RuleD:Specializationandgeneralization
RuleE:Edgepromotion
RuleF:Propertypromotion
RuleG:Propertyexpansion
Summary
5.Onemetarulefornormalization
Schemasandconstraints
Graphuniverses,transformationsandequivalence
Derivedtypes
Metarule:Addingandremovingderivedtypes
Provingthemetarule
Provingthe7rules:Renaming,Reversing,PropertyDisplacement,
Beyondtransformationrules
Summary
6.Validatinggraphschemas
7.Pixy:Firstorderlogicongraphdatabases
Background
OnSQL
Onfirstorderlogic
OnGremlin
Pixy:FirstorderlogicwithGremlin
ERmodelsinPixy
Queryrequirementsdon'tusuallymatterwhilemodeling
7.IntroductiontoDatabase
StateoftheartofDatabases
TypesofDBMS
NoSQLDBMS ComparisonofDBMS
Currenttrends
9.GraphDatabases
GraphTheoryandItsApplications
ConceptsofGraphDatabases
Queryperformance
10.Neo4j
IntroductionofNeo4j
AdvantagesofNeo4j
PropertiesofNeo4j
PerformanceInNeo4j
HowToIncreasePerformanceOfNeo4j?
CypherQueryLanguage
Structure
OperationsInCypher
LoadingDataWithCypher
UseCasesofNeo4j
11.Gettingstartedwithneo4j
InstallationorSetup
Installation&StartingaNeo4jserver
StartNeo4jfromconsole(headless,withoutwebserver)
StartNeo4jwebserver
StartNeo4jwebserver
Deleteoneofthedatabases
CypherQueryLanguage
RDBMSVsGraphDatabase
Cypher-Implementation
Creation Createanode
Createarelationship
QueryTemplates
CreateanEdge
Deletion
Deleteallnodes
Deleteallnodesofaspecificlabel
Match(capturegroup)andlinkmatchednodes
UpdateaNode
DeleteAllOrphanNodes
Python&Noe4j
12.Neo4jApplication
UseCaseSelected
Data
ImplementingData
Exportdata
QueryExamples(Neo4j-SQL)
ShortestPath
Betweennesscentrality:
Closenesscentrality:
PageRank:
CommunityDetection:
PossiblequeriesonSQL Bibliography
PartI
Chapter1
GraphDataModel
Arelationaldatabasehasaledger-stylestructure.ItcanbequeriedthroughSQL,and
itiswhatmostpeoplearefamiliarwith.Eachentryiscomposedofarowinatable.
Tablesarerelatedbyforeign-keyconstraints,whichishowyoucanconnectone
table’sinformationtoanother,liketheprimarykeys.Slowmulti-leveljoinsareoften
involvedwhenqueryingrelationaldatabases.
Foragraph,specificallyascatterplot,thinkoftheelementsasnodesor,dots.The
elementsforalinegrapharesimilarlyrepresentedbyvertices.Eachnodehaskeyvalu
e pairs and a label.Nodes are connected by relationships oredges.
Relationshipshaveatypeandadirection,andtheycanhaveproperties.Agraph
databaseissimplycomposedofdotesandlines.Thistypeofdatabaseissimpler
andmorepowerfulwhenthemeaningisintherelationshipsbetweenthedata.
Relationaldatabasescaneasilyhandledirectrelationships,butindirectrelationship
s aremoredifficulttodealwithinrelationaldatabases.
Figure1a
Whenbuildingarelationaldatabase,itisbuiltwithquestionsinmind.Whatkindsof
questionswillwebewantingtoanswer?Forexample,youwanttoknowhowmany
peoplewhoboughtatoaster,liveinKansas,haveacriminalrecord,anduseda
coupontobuythattoaster.Ifthedatabaseadministrator,orthepersonwhocreated
thedatabasedidnotanticipateaquestionlikethis,itmaybeverydifficulttoretrieve
thatinformationfrom arelationaldatabase.Forgraphdatabases,itispossibleto
answerunanticipatedquestions.Withagraph,youcanansweranyquestionaslong
asthatdataexistsandthereisapathbetweenthem.Agraphisdesignedto
traverseindirectrelationships.Withgraphdatabasesyoucanevenaddmore
relationshipsandstillmaintainperformance.Agraphdatabasetranscendsstoring
datapoints,rather,itstoresdatarelationships.Graphdatabasesstorerelationship
information.
Therearetwopropertiesofgraphdatabasesweshouldconsiderwheninvestigatingg
raph databasetechnologies:
Theunderlyingstorage
Somegraphdatabasesusenativegraphstoragethatisoptimizedanddesignedforstori
ng
andmanaginggraphs.Notallgraphdatabasetechnologiesusenativegraphstorage,
however.Someserializethegraphdataintoarelationaldatabase,anobject-oriented
database,orsomeothergeneral-purposedatastore.
Theprocessingengine
Somedefinitionsrequirethatagraphdatabaseuseindex-
freeadjacency,meaningthat
connectednodesphysically“point”toeachotherinthedatabase.Herewetakeaslight
ly broaderview:anydatabasethatfrom
theuser’sperspectivebehaveslikeagraphdatabase
(i.e.,exposesagraphdatamodelthroughCRUDoperations)qualifiesasagraphdata
base. We do acknowledge,however,the significantperformance advantages
ofindex-free
adjacency,andthereforeusethetermnativegraphprocessingtodescribegraphdatab
ases thatleverageindex-freeadjacency.
From adatabasepointofview,theconceptualtoolsdefiningaDB-Modelshould
addressatleastthestructuringanddescriptionofthedata,itsmaintainabilityand
theform toretrieveorquerythedata.Accordingtothesecriteria,aDB-Modelis
definedasacombinationofthreecomponents,firstacollectionofdatastructure
types,secondacollectionofoperatorsorinferencerulesandthirdacollectionof
generalintegrityrules.NotethatseveralproposalsofDB-Modelsdefineonlythe
datastructures,omittingsometimesoperatorsand/orintegrityrules.
Duetotheimportanceofmodelingconceptually,philosophicallyandinpractice,DB
Modelshavebecomeessentialabstractiontools.AmongthepurposesofaDB-
Model
are:Toolforspecifyingthekindsofdatapermissible;generaldesignmethodologyfor
databases;copingwithevolutionofdatabases;developmentoffamiliesofhighlevel
languagesforqueryanddatamanipulation;focusinDBMSarchitecture;vehiclefor
researchintothebehavioralpropertiesofalternativeorganizationsofdata.
Sincetheemergenceofdatabasemanagementsystems,therehasbeenanongoing
debateaboutwhattheDB-Modelforsuchasystem shouldbe.Theevolutionand
diversityofexistentDB-Modelsshowthatthereisnosilverbulletfordatamodeling.
Theparametersinfluencingtheirdevelopmentaremanifold,andamongthemost
importantwecanmentionthecharacteristicsorstructureofthedomaintobe
modeled,thetypeofintellectualtoolsthatappealstheuser,andofcourse,the
hardwareandsoftwareconstraintsimposed.Additionally,eachDB-
Modelproposal
isgroundedoncertaintheoreticaltools,andservesasbaseforthedevelopmentof
relatedmodels.
Figure1b:Evolutionofdatabasemodels.Rectanglesdenotemodels,arrowsindicateinfluences,and
circlesdenotetheoreticaldevelopments.Onthelefthandsideatimelineinyears.
DatabaseModelsEvolution–BriefHistoricalOverview
InthebeginningsofthedesignofDB-Models,physical(hardware)constraintswere
oneofthefundamentalparameterstobeconsidered.Beforetheadventofthe
relationalmodel,mostDB-Modelfocusedessentiallyinthespecificationofthe
structureofdatainactualfilesystems.Kerschbergetal.˜cite50130developeda
taxonomyofDB-Modelspriorto1976,comparingessentiallytheirmathematical
structuresandfoundation,andthelevelsofabstractionused.
TworepresentativeDB-Modelsarethehierarchical andnetworkmodels,which
emphasizethephysicallevel,andoffertheuserthemeanstonavigatethedatabase
attherecordlevel,thusprovidinglow leveloperationstoderivemoreabstract
structures.
TherelationalDB-ModelwasintroducedbyCoddandhighlightsthe
conceptoflevelofabstractionbyintroducingtheideaofseparation
betweenphysicalandlogicallevels.Itisbasedonthenotionsofsetsand
relations.Duetoitssimplicityofmodeling,itgainedawidepopularity
amongbusinessapplications.
SemanticDB-Modelsallowdatabasedesignerstorepresentobjects
andtheirrelationsinanaturalandclearmannertotheuser(asopposed
topreviousmodels).Theyintendedtoprovidetheuserwithtoolsthat
couldcapturefaithfullythesemanticsoftheinformationtobemodeled.
Awellknownexampleistheentityrelationshipmodel.
ObjectorientedDB-Modelsappearedintheeighties,whenmostofthe
researchwasconcernedwithsocalled“advancedsystemsfornewtypesof
applications.TheseDB-Modelsarebasedontheobjectorientedparadigm
andtheirgoalisrepresentingdataasacollectionofobjectsthatare
organizedinclassesandhavecomplexvaluesassociatedwiththem.
SemistructuredDB-Modelsaredesignedtomodeldatawithaflexible
structure,e.g.,documentsandWebpages.Semistructureddata(also called
unstructured data)is neitherraw norstrictly typed as in
conventionaldatabasesystems.Additionally,dataismixedwiththe
schema,afeaturewhichallowsextensibleexchangeofdata.TheseDBModelsappea
redintheninetiesandarecurrentlyinevolution.
TheXML(eXtendedMarkupLanguage)modeldidnotoriginateinthe
databasecommunity.Althoughoriginallyintroducedasastandardto
exchangeandmodeldocuments,soonitbecameageneralpurpose
model,withfocusoninformationwithtreelikestructure.Similarto
semistructuredmodel,schemeanddataaremixed.SeeSection2.3fora
moreindepthcomparisonamongthesemodels.
OtherModelsandFrameworks.ThereareotherimportantDB-Models
designedforparticularapplications,aswellasmodelingframeworksnot directly
focusing in database issues,which indirectly concern graph database
modeling. Among the DB-Models are Spatial databases, Geographical
Information Systems (GIS), Temporal DB-Models], MultidimensionalDB-
Models].Frameworksrelatedtoourtopic,butnot
directlyfocusingindatabaseissuesareSemanticNetworks.
GraphDatabaseModels–BriefHistoricalOverview
ThenotionofgraphDB-Modelmadeitsappearancealmostinparallelwiththe
objectorientedDB-
Models,asanalternativetothelimitationsoftraditionalDBModelsforcapturingthei
nherentgraphstructureofdataappearinginapplications
suchashypertextorgeographicdatabasesystems,wheretheinterconnectivityof
dataisanimportantaspect.
Activityaroundgraphdatabasesflourishedinthefirsthalfoftheninetiesandthenthe
topicalmostdisappeared.Thereasonsforthisdeclinearemanifold:thedatabase
communitymovedtowardsemistructureddata(aresearchtopicwhichdidnothave
linkstothegraphdatabaseworkinthenineties);theemergenceofXMLcapturedallth
e attentionoftheworkonhypertext;peopleworkingongraphdatabasesmovedto
particularapplicationslikespatialdata,web,documents;thetreelikestructureiseno
ugh
formostapplications.Figure2reflectsthisevolutionbymeansofpaperspublishedin
mainconferencesandjournals.
GraphDB-Modelsemergedwiththeobjectiveofmodelinginformationwhose
structureisagraph.Inanearlyapproach,RoussopoulosandMylopoulosfacingthe
failureofcurrent(atthetime)systemstotakeintoaccountthesemanticsofthe
database,proposedasemanticnetworktostoredataaboutthedatabase.An
implicitstructureofgraphsforthedataitselfwaspresentedintheFunctionalData
Model,whosegoalwastoprovidea“conceptuallynatural”databaseinterface.A
differentapproachproposedtheLogicalDataModel,whereanexplicitgraphDBMo
delintendedtogeneralizetherelational,hierarchicalandnetworkmodels.Years
laterKuniiproposedagraphDB-Modelforrepresentingcomplexstructuresof
knowledgecalledGBASE.
GraphDatamodeling
WhatisaGraphDataModel?
GraphDB-ModelisconceptualizedaccordingtothethreebasiccomponentsofaDB
-Model,namelydatastructures,transformationlanguage,andintegrityconstraints.
AgraphDB-Modelischaracterizedby:
Thedataand/ortheschemaarerepresentedbygraphs,orbydata structures
generalizing the notion of graph (hypergraphs,
hypernodes,hygraphs,etc.).Almosteverybodycoincideonthis
pointmoduloslightvariations.
Letusreviewdifferentwordingsofauthorsonthisissue.Theapproach
istomodelthedatabasedirectlyandentirelyasagraph[58].Agraph DB-
Modelisonewhosesingleunderlyingdatastructureisalabeled
directedgraph;thedatabaseconsistsofasingledigraph.Adatabase
schemainthismodelisadirectedgraph,whereleavesrepresentdata
andinternalnodesrepresentconnectionsbetweenthedata.Directed
labeledgraphsareusedastheformalism tospecifyandrepresent database
schemes,instances,and rules.The modelis basically
definedasalabeleddirectedgraph.Inthismodel,adatabaseis
describedintermsofalabeleddirectedgraphcalledschemagraph.A graphDB-
Modelformalizestherepresentationofthedatastructures
storedinthedatabasesasagraph.Theschemaaswellasthe
instanceofanobjectdatabaseisrepresentedbyagraph.Thenodesof
theinstancegraphrepresenttheobjectsofthedatabase.Database
instancesanddatabasesschemesaredescribedbycertaintypesof
labeledgraphs[68].Themodelfordataisorganizedasgraphs.
Labeledgraphsareusedtorepresentschemesandinstances.
Ontopofthesedescriptions,onecouldaddthefactthatsometimestheschema
andthedata(instances)aredifficulttodifferentiateinthesemodels,afactthat
resemblescloselysemistructuredmodels.Butinmostcasestheschemaandthe
instancesareseparated.
Theexistenceofintegrityconstraintsenforcingtheconsistencyofthe data,which
aredirectlyrelated to thegraph datastructure.For example,labelswith unique
names typing constraints on nodes
functionaldependencies,domainandrangeofproperties.
Summarizing,agraphDB-Modelisamodelwherethedatastructuresfortheschema
and/orinstancesaremodeledasa(labeled)
(directed)graph,orgeneralizationsofthe
graphdatastructure,wheredatamanipulationisexpressedbygraphorientedoperati
ons
andtypeconstructors,andhasintegrityconstraintsappropriateforthegraphstructure
.
WhyaGraphDataModel?
TheapplicationareasofgraphDB-
Modelmodelsarethosewereinformationaboutthe
interconnectivityorthetopologyofthedataismoreimportant,orasimportantas,the
dataitself.Thisisusuallyaccompaniedbythefactthatdataandrelationsamongdata
areatthesamelevel.Infact,introducinggraphsasamodelingtoolhasseveral
advantagesforthistypeofdata.
First,itleadstoamorenaturalmodeling:graphstructuresarevisibletotheuser.They
allow anaturalwayofhandlingdataappearinginapplications(e.g.hypertextor
geographicdatabases).Graphshaveanimportantadvantage:theycankeepallthe
informationaboutanentityinasinglenodeandshow relatedinformationbyarcs
connected to it.Graph objects(likepaths,neighborhoods)mayhavefirstorder
citizenship;auser
Typeof Abstract. Basedata Main Model level structure Focus Datacomplex. homogeneity.
Table1:Acoarsegranularitycomparativeviewamongdifferentgeneral
purposedatabasemodels.Theparametersare:abstractionlevel,base
datastructureused,whatarethetypesofinformationobjectstheDBModelfocusin,co
mplexityandhomogeneityofthedataitemsmodeled.
Second,queriescanreferdirectlytothisgraphstructure.Associatedwithgraphsare
specificgraphoperationsinthequerylanguagealgebra,suchasfindingshortest
paths,determiningcertainsubgraphs,andsoforth.Explicitgraphsandgraph
operationsallowausertoexpressaqueryataveryhighlevel.Tosomeextent,this
isincontrasttographmanipulationindeductivedatabases,whereoftenfairly
complexruleprogramsneedtobewritten..Lastbutnotleast,forpurposesof
browsingitmaybeconvenienttoforgettheschema.
Third,asfarasimplementationisconcerned,graphdatabasesmayprovidespecial
storagegraphstructuresfortherepresentationofgraphsandthemostefficient
graphalgorithmsavailableforrealizingspecificoperations.Althoughthedatamay
havesomestructure,thestructureisnotasrigid,regularorcompleteastraditional
DBMS.Itisnotimportanttorequirefullknowledgeofthestructuretoexpress
meaningfulqueries.Thesystem canuseefficientgraphalgorithmsdesignedto
utilizethespecialgraphdatastructures[58].
ComparisonwithotherDatabaseModels
InthissectionwecomparethemostinfluentialDB-ModelswithgraphDB-Models.
Table1presentsacoarsegranularityoverviewofthemostinfluentialmodels.Below
wepresentthedetails.
PhysicalDB-Models.Theywerethefirstonestoofferthepossibility
toorganizelargecollectionsofdata.Amongthemostimportantonesare the
hierarchicaland network models.These models lack good
abstractionlevelandareveryclosetophysicalimplementations.The
datastructuring isnotflexibleand notaptto modelnontraditional
applications.Forourdiscussiontheydonothavemuchrelevance.
RelationalDB-ModelwasintroducedbyCoddtohighlighttheconcept
oflevelofabstractionbyintroducingacleanseparationbetweenphysical
andlogicallevels.Graduallythefocusshiftedtomodelingdataasseen
byapplicationsandusers.Thisistheemphasisandtheachievementof
therelationalmodel,inatimewherethedomainofapplicationwere
basicallysimpledata(banks,payments,commercialandadministrative
applications).
ThedifferencesbetweengraphDB-ModelsandtherelationalDB-Model
aremanifold.Amongthemostrelevantonesare:therelationalmodelwas
directedtosimplerecordtypedatawithastructureknowninadvance
(airlinereservations,accounting,inventories,etc.).Theschemaisfixedand
extensibilityisadifficulttask.Integrationofdifferentschemesisnoteasy nor
automatizable. The query language does not support paths,
neighborhoodsandseveralothergraphoperations,likeconnectivity(an
exceptionistransitivity).Therearenoobjectsidentifiers,butvalues.
SemanticDB-Modelshavetheirorigininthenecessitytoprovidemore
expressivenessandincorporatearichersetofsemanticsintothedatabase from
theuserpointofview.Theyallow databasedesignerstorepresent
objectsandtheirrelationsinanaturalandclearmanner(similartothewaythe
userviewanapplication)byusinghighlevelabstractionconceptssuchas
aggregation,classificationandinstantiation,subandsuperclassing,attribute
inheritanceandhierarchies.Awellknownexampleistheentityrelationship
model.Ithasbecomeabasisfortheearlystagesofdatabasedesign,butdue
tolackofprecisenesscannotreplacemodelslikerelationalorObjectOriented.
OtherexamplesofsemanticDB-ModelsareIFO
andSDM.ForgraphDBModelsresearch,semanticDB-
Modelsarerelevantbecausetheyarebasedon
agraphlikestructurewhichhighlightstherelationsbetweentheentitiestobe
modeled.
OO DB-ModelshavebeenrelatedtographDB-Modelsduetothe
explicitorimplicitgraphstructureintheirdefinitions.Nevertheless,there
remainimportantdifferencesrootedintheform thateachofthem
modelstheworld.OODB-Modelsviewtheworldasasetofcomplex
objectshavingcertainstate(data)andinteractingamongthem by
methods.Onthecontrary,graphDB-Models,viewtheworldasanetwork
ofrelations,emphasizing theinterconnection ofthedata,and the
propertiesoftheserelations.TheemphasisofOODB-Modelsisonthe
dynamicsoftheobjects,theirvaluesandmethods.Incontrast,graphDBModelsemph
asizestheinterconnectionwhilemaintainingthestructural
andsemanticcomplexityofthedata.
SemistructuredDB-Models.Theneedforsemistructureddata(alsocalled
unstructureddata)wasmotivatedby:theincreasedexistenceofunstructured
data,dataexchangeand,databrowsing.Insemistructureddatathestructureis
irregular,implicitandpartial;theschemadoesnotrestrictthedata,only
describesit,isverylargeandrapidlyevolving;theinformationassociatedwitha
schemaiscontainedwithinthedata(datacontainsdataanditsdescription,soit
isselfdescribing).AmongthemostrepresentativemodelsareOEM,Lorel,UnQL,
ACeDBandStrudel.Generally,semistructureddataisrepresentedbyatreelike
structure.Neverthelesscyclesbetweendataarepossible,establishinginthis
wayastructuralrelationwithgraphDB-Models.Someauthorscharacterize
semistructureddataasrooteddirectedconnectedgraphs.
GraphDataModelMotivationsandApplications
GraphDB-Modelsaremotivatedbyreallifeapplicationswhereinformationabout
interconnectivityofitspiecesisasalientfeature.Wewilldividetheseapplication
areasinClassicalandComplexnetworks.
Onthesamedirection,theobservationthatgraphshavebeenintegral
partofthedatabasedesignprocessinsemanticandobjectorientedDB
-Models,broughttheideaofintroducingamodelinwhichboth,data
manipulationanddatarepresentationweregraphbased.
Limitations(atthetime)ofknowledgerepresentationsystems,and
theneedforintricatebutflexibleknowledgerepresentationand
derivationtechniques.
TheneedforimprovingfunctionalitiesofobjectorientedDB-Models.In this
direction the application in mind were CASE,CAD,image
processing,andscientificdataanalysis.
Graphicalandvisualinterfaces,geographical,pictorialandmultimedia systems.
ApplicationswheredatacomplexityexceededtherelationalDB-Model
capabilitiesalsomotivatedgraphdatabases.Forinstance,managing
transportnetworks(train,plane,water,telecommunications),spatially
embeddednetworkslikehighway,publictransport.Severalofthese
applicationsarenowinthefieldofGeographicalinformationsystems
andspatialdatabases.
ComplexNetworks.Severalareashavewitnesstheemergenceof huge
networksofdata which share some particularmathematical parameters, called
complex networks. The need for database
managementforsomeclassesofthesenetworkshasbeenrecently
highlighted.Althoughitisnotevidentyetiffrom thepointofviewof
databasesonecantreatthemasawhole,wewilldescribethemtogether
forpresentationpurposes.AfterthesurveyofNewman,wewillgroup them in four
categories:socialnetworks,information networks,
technologicalnetworksandbiologicalnetworks.Followingwedescribe
specificexamplesforeachofthem.
Insocialnetworks,nodesarepeopleandgroupswhilelinksshow
relationshipsorflowsbetweenthenodes.Someexamplesarefriendship,
businessrelationships,patternsofsexualcontacts,researchnetworks
(collaboration,co-authorship),communicationrecords(mail,telephone
calls,email),Computernetworks,Nationalsecurity.Thereisgrowing
activityintheareaofSocialNetworkanalysis,visualizationanddata
processinginsuchnetworks.
Ininformationnetworksoccurrelationssuchascitationsbetween
academicpapers,WorldWideWeb(hypertext,hypermedia),peertopeer
networks,relationsbetweenwordclassesinathesaurus,preference networks.
Intechnologicalnetworksthestructureismainlygovernedbyspaceand
geography.SomeexamplesareInternet(asnetworkofcomputers),Electric
powergrids,airlineroutes,telephonenetworks,deliverynetwork(postoffice).
TheareaofGeographicInformationSystems(GIS)istodaycoveringabigpart
ofthisarea(roads,railways,pedestriantraffic,rivers).
Itisimportanttostressthatclassicalquerylanguagesofferlittlehelp
whendealingwiththetypeofqueryneededintheaboveareas.Asexamples,
dataprocessinginGISincludegeometricoperations(areaorboundary,
intersection,inclusions,etc),topologicaloperations(connectedness,paths,
neighbors,etc)andmetricoperations(distancebetweenentities,diameter
ofthenetwork,etc).Ingeneticregulatorynetworksexamplesofmeasures
areconnectedcomponents(interactionsbetweenproteins)anddegreesof
nearestneighbors(strongpaircorrelations).Insocialnetworks,distance,
neighborhoods,clusteringcoefficientofavertex,clusteringcoefficientofa
network,betweenness,sizeofgiantconnectedcomponents,sizedistribution
offiniteconnectedcomponents.SimilarproblemsariseintheSemanticWeb,
wherequeryingRDFdataincreasinglyneedsgraphfeatures.
RepresentativeGraphDatabaseModels
InthissectionwedescribeinsomedetailthemostrepresentativegraphDB-Models,
choosingthosethatdefineanduseexplicitlygraphstructuresorgeneralizationsof
them.Additionallywedescribeotherrelatedmodelsthatusegraphs,donotfit
properlyasgraphDB-Models.Inthem,graphsareused,forexample,fornavigation,
fordefiningviews,oraslanguagerepresentation.
Foreachproposal,wepresenttheirdatastructuresand,whenavailable,theirquery
languagesandintegrityconstraintrules.Ingeneral,therearefewimplementationsan
d nostandardbenchmarks,henceweavoidsurveyingthisissue.Togiveaflavorofthe
modelingineachproposal,wewillrunthefollowingexampleaboutatoygenealogy
showninFigure3.
Figure2:Agenealogydiagram(righthandside)representedastwotables(lefthand
side)NAMELASTNAMEandPERSONPARENT.
(Childreninheritthelastnameofthe fatherjustformodelingpurposes.)
Figtype3:LogicalDataModel.Theschema(ontheleft)usestwobasictypenodes
forrepresentingdatavalues(NandL),andtwoproducttypenodes(NLandPP)
toestablishrelationsbetweendatavaluesinarelationalstyle.Theinstance
(ontheright)isacollectionoftables,oneforeachnodeoftheschema.Notethat
internalnodesusepointers(names)tomakereferencetobasicandsetdata
datavaluesdefinedbyothernodes.
LogicalDataModel(LDM)
MotivatedbythelackofsemanticsintherelationalDB-
Model,KuperandVardiproposed aDB-
Modelthatgeneralizestherelational,hierarchicalandnetworkmodels.Themodel
describesmechanismstorestructuredata,alogicalquerylanguageandanalgebraic
querylanguage.
InLDM aschemaisanarbitrarydirectedgraphwhereeachnodehasoneofthe
followingtypes:TheBasictypedescribesanodethatcontainsthedatastored;the
CompositiontypeTEXdescribesanodethatcontainstupleswhosecomponents
aretakenfromthechildrenofit;theCollectiontypedescribesanodethatcontains
sets,whoseelementsaretakenfromchildrenofit.Summarizing,internalnodesare
oftype⊗ or⊛ representingstructureddata,terminalnodesareoftypeand
representatomicdata,andedgesrepresentconnectionsbetweendata.
Asecondversionofthemodel,besidesrenamingthenodes ⊗and
⊛ asproductandpowerrespectively,incorporatesanewtype,theUniontype∪ ,
intendedtorepresentacollectionwhosedomainistheunionofthedomainsofits
children(seeexampleinFigure4).
ALDMdatabaseinstanceconsistsofanassignmentofvaluestoeachnodeofthe
schema.Inthissense,theinstanceofanodeisasetofelementsfrom the
underlyingdomain(forbasictypenodes)andtuplesorsetstakenfromtheinstance
ofthenode’schildren(for⊗,⊛andtypes).
Withtheobjectiveofavoidingcyclicityattheinstancelevel,themodelproposestoke
ep
adistinctionbetweenmemorylocationsandtheircontent.Thus,instancesconsistofa
setoflvalues(theaddressspace),plusanrvalue(thedataspace)assignedtoeachof
them.Thesefeaturesallowtomodeltransitiverelationslikehierarchiesandgenealo
gies.
Overthisstructureafirstordermanysortedlanguageisdefined.Withthislanguage,
aquerylanguageandintegrityconstraintsaredefined.Finally,andalgebraic
language–equivalenttothelogicallanguage–isproposed,providingoperations
fornodeandrelationcreation,transformationandreductionofinstances,andother
operationslikeunion,differenceandprojection.
LDM isacompleteDB-Model(i.e.datastructuresplusquerylanguagesandintegrity
constraints)Themodelsupportsmodelingofcomplexrelations(e.g.hierarchies,
recursiverelations).Thenotionorvirtualrecords(pointerstophysicalrecords)pro
ves
usefultoavoidredundancyofdatabyallowingcyclicityattheschemaandinstancelev
el. Duetothefactthatthemodelisageneralizationofothermodels(liketherelational
model),theirtechniquesorpropertiescanbetranslatedintothegeneralizedmodel.A
relevantexampleisthedefinitionofintegrityconstraints.
Figure4:HypernodeModel.Theschema(left)definesapersonasacomplexobject
withthepropertiesnameandlastnameoftypestring,andparentoftypeperson
(recursivelydefined).Theinstance(ontheright)showstherelationsinthegenealogy
amongdifferentinstancesofperson.
Chapter2
GraphSchemas
Selectingvertexlabels
TheTinkerpoppropertygraphmodelcanbesummarizedasfollows.Agraphhasaset
of
verticesandasetofedges.Eachedgeconnectsanoutvertextoaninvertex.Verticesand
edgescanhavepropertieswhicharekeyvaluepairswithStringkeysandprettymucha
ny valuethattheunderlyingdatabasesupports.
Sofar,themodellooksschemalesssinceverticesandedgescan'tbedistinguishedfro
m otherverticesandedgeswithoutknowingwhatthepropertiesmean
.However,edgeshave
alwayshadlabels.AndwithTinkerpop3,verticeswillhavelabelsaswell.Thesamei
strue withNeo4J'slatestmajorversion.
Ifeveryvertexmust belabeled,whatisthecorrectmethodtoselectalabel?
Whatshoulda labelsayaboutavertexoranedge,fromtheapplication'sperspective?
Wethinkavertexlabelshouldrepresentthemostgranuartype
ofthevertex,whereeach "vertextype"
isassociatedwithaunquecombnaonof:
meaning(semantics),
setofpropertykeynamesandvaluetypes,and
setofoutgoingedgelabels,whereeachlabeltypeisannotatedwiththepossible
directionsoftheedge(in/out/both)andcardinality.
Whyso?
Becauselabelsrepresentingvertextypesgivetheapplicationthemostdetailed
informationaboutthe behaor ofthatvertex,therebyensuringthattheapplicationcan
processthevertexaccordingly.Inotherwords,oneshouldnotbeabletosubdivideav
ertex
typetogettwovertextypesthatbehavedifferentlyfromtheapplication'sstandpoint.
Examplesoflabelselection
Let'sgothroughthelabelselectionexercisewiththeclassic6vertextinkergraphshow
nin
thepropertygraphmodelpage.SincethisisaTinkerpop2stylegraph,itdoesn'thavev
ertex
labels.We'llnowtrytocomeupwiththevertexlabelsbysimplylookingatthevertexbe
havior.
Fgure1:TnkerGraphexampe
Ifyoulookclosely,therearetwotypesofvertices:oneswith'name'and'age',andones
with 'name'and'lang'.Letuslabe
theformervertextypeas'Person'andthelattervertextypeas
'Software'.Inotherwords,youhavepersonsnamed'marko','vadas','peter'and'josh'
and softwaresnamed'lop'and'ripple'.
Afteranalyzingtheedgelabelsanddirection,youcouldsaythatthe'Person'vertextyp
ehas:
Propertykeys'name'and'age'
Edgeslabeled'knows'intheOUTdirection
Edgeslabeled'created'intheOUTdirection
The'Software'vertextypehas:
Propertykeys'name'and'lang'
Edgeslabeled'created'intheINdirection
Now,anapplicationlookingatthisgraphautomaticallyknowswhattoexpectwhenitr
eadsa
vertexlabeled'Person'or'Software'.Wecandefinetwodifferentindexeson'name',o
nefor
PersonandoneforSoftware,tomakesurethatsoftwaresearchesdotpckuppeope
,or viceversa.
Thelabelselectionprocesscan'tbefullymechanicalthough.Forinstance,apersonw
ithno
friendscanbethoughtofasaseparatevertextype,becausetherearenoadjacent'know
s'
edgestosuchvertices.However,unlessthismakessenseinthecontextoftheapplicati
onor
thedatamodel,thereisnopointinsubdividingthe'Person'vertextypeas'Loner'and'P
erson
withFriends'.Thesameargumentgoesforsubdividingthepersonvertextypeasthe
Developer'and'NonDeveloper'basedonwhetherthatpersoncreatedasoftware.
Torecap,therightwaytoselectvertexlabelsforapropertygraphistofirstfigureoutthe
vertextypesandthebehaviorsofeachvertextype.Thetotityofthesebehaorssthe
graphschema.
Drawingagraphschema
Thebestwaytorepresentagraphschemais,ofcourse,agraph.Thisishowthe
graphschemalooksfortheclassicTinkerpopgraph.
Fgure2:Exampegraphschemashownasapropertygraph
Thegraphschemaisprettymuchapropertygraph.Theverticescorrespondtovertexty
pes,
andedgescorrespondtoedgetypes.Thepropertykeysarenamedaftertheallowed
propertykeysforthatvertextype.Everypropertyvalueintheschemagraphcontainsth
e nameofthemostspefcsuperass representingthecorrespondingpropertyvalues
oftheinstancegraph.Optionalpropertiescanhavea'?'aftertheclassname(notshown
here).
Edgepropertiesarelikevertexproperties,exceptthatthereisaspecialpropertyname
d'#' thatholdsthecarnaty from
theouttoinvertextypes.Commonsensedictatesthatthe
cardinalityisM:N,i.e.,many-to-
many,forboth'knows'and'created'.Onecouldbemisledto
thinkthatsomeoftheserelationshipsare1:Nbylookingatthe6vertexgraph.Thisisan
other reasonfornotfullyrelyingonreverseengineeringmethodstoderiveschemas.
WehavegonethroughasimilarexercisefortheGratefulDeadgraph.Asyoucansee,th
egraph
schemaisverysimple,althoughthevisualizationofthegraphshowninthelinklooks
complicated.
Fgure3:GratefulDeadgraphschema
OurthirdandfinalexampleistheschemafortheKennedyfamilytreegraph.Again,the
schemaisextremelysimple(simplisticgivenrecentUSSupremeCourtrulings).
Fgure4:Famlytreegraphschema
NotethatinthePixyschema,thepropertylistsarethesamefor'Man'and'Woman',butth
e
directionofthe'wife'edgeisfunctionallydependentonthevalueofthe'sex'property.
Thisis veryinterestingbecausethismeansthatgraphschemascoudbenormazed
usingruleslike relationaldatabases.Wewilldiscussthisinlatersections!
Summary
Thissectionintroducedtheideaofschemasforpropertygraphsanddescribedhow
the
schemaitselfcanberepresentedasapropertygraph.Furthermore,itdescribedameth
odto deve
thegraphschemaforanexistingpropertygraphbyfindingthemostgranulardivision
ofitsverticesintovertextypes.
Graphschemas(orschemagraphs)helpapplicationdevelopersbetterunderstandth
e graph'sstructure.
Inthenextsection,wewilllookattheproblem theotherwayaround.Canwe deve
agraph schemafrom
ahigherlevelconceptualmodelsuchasanEntityRelationshipmodel?Could
thisbeasystematicmethodtoselectvertexandedgelabels,andpropertykeyswhen
designingagraphdatabaseapplication?
Chapter3
ConvertingERmodelstographschemas
Thissectionwilldescribeageneralmethodtoconvertanentityrelationshipmodelto
a
propertygraphschema.Usingthismethod,adatabasedesignercandevelopERmode
lsusing
standardconceptualmodelingpractices,butstorethedatainagraphdatabaseinstead
ofa relationaldatabase.
ERmodelsanddiagrams
TheentityrelationshipmodelwasproposedbyPeterCheninhis1976papertitled"Th
e
EntityRelationshipModelTowardaUnifiedViewofData".Theideasinthispaperar
e taughtinmostdatabasecourses.ThiscoursepagegivesaquickdescriptionoftheER
model.
Conceptualmodelingisaparticularlyusefulexercisewhenembarkingonaprojectth
at involvesanewdomain.Thegoalofthisexerciseistoidentifykeyconceptsinthe
domainthatmustbecapturedinthedatamodel.Oneofthetechniquesinconceptual
modelingistolookatthenaturallanguagedescriptionofanapplication'srequirement
s.
Theserequirementscanbeanalyzedtoidentifytheentityandrelationshiptypes,using
Chen's"rulesofthumb"(quotedfromWikipedia):
Commonnoun Entitytype
Propernoun Entity
Transitiveverb Relationshiptype IntransitiveverbAttributetype
Adjective Attributeforentity Adverb Attributeforrelationship
Example
Letusconsiderthefollowingrequirements:
Modelasystem whereuserscreatepages,whichtheyown.Userscaninviteother
userstolookatcertainpagesthattheyown.Apagecanspecifyoneormoretags
whicharethenusedtorecommendothersectionstotheauthorsandinvitedreaders.
Youcouldanalyzethisrequirementandcomeupwiththreeentitytypes,viz.User,Page
and
Tag.TherelationshiptypesOwns,InvitesandTaggedAscapturetherelationships.N
otethat
allverbsdon'tbecomerelationships(likecreate).Similarly,thefactthatinvitationso
nly applytopagesthatauserownsislostinthismodel.
Fgure5:ExampeERdagram
Thesquareshapedboxesshow
entitytypes,whichrepresentsetsofsimilarentities.The
diamondshapedboxesshowrelationshiptypes,whichrepresentsetsofsimilarrelati
onships. Arelationshiptyperelatestwoormoreentitytypestoeachother.
Thediagram
showsthecardinalityofeachentity'scontributiontoarelationship,suchas1:N
(onetomany)orN:N(manytomany).Thecardinalityisspecifiedusingthe'lookacros
s'
method.Forexample,aUserownsNpages,andapageisownedby1user.Therearekn
own limitationsoflookacrosscardinalityforternaryrelationshipslikeInvites.
Thediagram
alsoshowssomeovalshapedattributes,likeusername.Theseattributesmustbe
assignedtoentityorrelationshipstypes.Attributesthatserveasexternalidentifiersm
ustbe underlined.
Now,itisarguablewhetherTagmustbeanentityornotinthefinaldatamodel.But
fromanERperspective,itmakessensetomodeltagasanentity,especiallyiftagsare
usedtoestablishrelationshipsacrossusersforrecommendations.
ProceduretoconvertanERmodeltoagraphschema
TheproceduretoconvertanERmodeltoarelationalmodeliswellknownanddiscuss
edin
thesameOSUcoursenotesthatwereferencedearlier.Wewillnowgothroughasimila
r proceduretheERdiagramwiththeaboveexample.
Rule#1:Entitytypesbecomevertextypes
EntitytypessuchasUser,PageandTagbecomevertextypes.
Thenameoftheentitytypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.
Notethatwearedrawingagraphschema,notagraphinstance.SotheUsertyperefersto
any
numberofusersinboththeERandthegraphschemarepresentation.Henceweusethet
erm
"vertextype"andnotvertex.Theentityrelationshipmodelusessimilartermssuchas"
entity types"(likeUser)andentities(likeJohnDoe,theuser).
Rule#2:Binaryrelationshiptypesbecomeedgetypes
AllbinaryrelationshiptypesintheERdiagram
canbeconvertedtoedgetypesinthegraph schema.
Thenameoftherelationshiptypebecomesthelabeloftheedgetype.
Theassociatedattributesbecomethepropertiesoftheedgetype.
Theendpointsoftheedgetypearethevertextypescorrespondingtotherelatedentity
types.Thedirectiondoesn'tmatter.
Hereisanexampleshowingthe"Owns"relationshiptypetranslatedtoan"owns"edg
etype:
Notethatonetomanyandmanytomanybinaryrelationshipscanbemodeledasedges
without
introducingnewvertices.Withrelationalmodels,youwouldneedanadditionaltable
tocapture manytomanyrelationships.
Fgure7:Ownsraonshpconvertedtoanownsedge
Aminorpointisthatthecardinalityiswrittenas1:NbecausetheUser(outvertextype)t
o
Page(invertextype)relationshipisa1:Nrelationship,usingthelookacrossmethod.I
nother
words,auserhasNpagesandapagehas1user.Ifthedirectionoftheedgewerereverse
d, thecardinalitywouldbeN:1.
Rule#3:Naryrelationshiptypesbecomevertextypes
Naryrelationshiptypesrelatemorethantwoentitytypes.Suchrelationshiptypesbec
ome vertextypesinthepropertygraphmodel.
Thenameoftherelationshiptypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.
Thenewvertextypeincludesedgestothevertextypescorrespondingtotherelatedent
ity types(seeexample).Theseedgetypesarelabeledaftertheroleoftheparticipating
entityintherelationship.Thedirectiondoesn'tmatterforanyoftheseedges.
HereisanexampleshowingtheternaryrelationshipInvitestranslatedtothevertextyp
e Invitation:
ThecardinalityinthegraphschemaisN:1becausetheInvitationtoPagerelationshipi
sanN:1
relationship,usingthelookacrossmethod.Inotherwords,aninvitationcouldbeissue
dto1
page,andapage(invertex)couldbepartofNinvitations.Itispossibletojustreverses
omeof
theroletypes,likeinvitee,withoutaffectingtheoverallmodel.Inthatcasethecardinal
itywill be1:N.
Fgure8:IntesraonspconvertedtoanIntaonvertextype
Wehaven'tshowntheprocessforweakentitytypesandidentifyingrelationshiptypes
but
theseareexactlythesameasentitytypesandrelationshiptypes.Graphdatabasesare
more
forgivingthanrelationaldatabasesinthattheyallowtwoverticestohavethesamelab
eland
propertykeyvaluepairs.Thissimplifiesthetranslationofweakentitytypesandidenti
fying relationshiptypesintothepropertygraphmodel.
Conversionexample
HereisthegraphschemacorrespondingtotheexampleERdiagram.Asyoucansee,thi
s
diagramprovidesenoughinformationforanapplicationdevelopertoworkwiththeg
raph database.
Fgure9:GraphschemaforUserPageTagERdagram
Thisisthe"logicalmodel"fortheexampleconceptualmodelintroducedinthefirstfig
ure.We
cantweakthismodelfurtherbyrenamingthelabels,changingdirectionsoftheedges,a
ndso on.Thiswillbethetopicofthenextsection.
Verticesarevertices,andedgesare...edges
Naryrelationshipsareverycommoninconceptualmodels.Forexample,"Joebought
a
headphoneatTarget"isanexampleofa"Bought"relationshipthatrelatesaUsertoaPr
oduct
toaStore.Suchrelationshipsmustbemodeledasvertices,notedges(unlessyouareus
ing hypergraphs).Hencewethinkitismseadng
tothinkofedgesasrelationshipsandvertices asentities.
Chapter4
NormalizingGraphSchemas
Thissectionlooksathow graphschemascanbemanipulatedandtransformedto
equivalentgraphschemas.Thisissimilartothesplittingandmergingoftablesin
relationaldatamodels,typicallyperformedtonormalizeordenormalizearelational
schema.
Normalizationofrelationaldatabases
Thegoalofdatabasenormalizationismakesurethatrelationalschemasareeasytomo
dify,
easytoextend,informativetousersandsupportiveofvariousquerypatterns.Thevari
ous
normalforms,suchas1NF,2NF,andsoon,defineconstraintsthatatablemustsatisfyto
be
compliantwiththatnormalform.Althoughthedefinitionsofthenormalformscanbe
mathematical,thebasicideaisbreakuptableswithduplicateinformation.Hereisan
examplefromtheWikipediapageon3NF:
Thepreviousfigurebreaksupthetournamentwinnerstableintotwotables,onewithp
layer
detailsandonewiththetournamentdetails.Theactualruleson"functionaldependenc
ies"and
"nonprimeattributes"arehardtoremember,buttheprocessofsplittingandmergingta
bles
comesintuitivelywithexperience.Forexample,iftherewasanexistingtablewhichh
adone rowperplayer,we'dprobablymovethe"dateofbirth"tothattable.
Transformationrulesthatproduceequivalentschemas
Thissectionlistssometransformationrulesthatproduceequivalentgraphschemas.
Agraph schemaisequven
toanothergraphschemaifthedatastoredinoneschema,alongwith
theapplicationsthataccessit,canbeportedtotheotherschema,andviceversa.These
rules arelikesplittingandmergingtablesinrelationalmodels.
Thetransformationrulesinthissectioncanbemechanicallyappliedtoanyschema,an
dhas
nothingtodowithitssemantics.Byapplyingacombinationoftheserules,youcouldsi
mplify thesemanticsandimprovetheusabilityofyourgraphmodel.
RuleA:Renamingpropertiesandlabels
Thisruleconsistsofthreetransformationsthatresultinequivalentschemas:
Anyvertexlabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
vertexlabel.
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexistingedge
labelbetweentheoutandinvertextypes.
Anyvertex/edgepropertycanberenamedsolongthenewnamedoesn'trefertoan
existingpropertyofthevertex/edgetype.
Thefollowingfigureillustratessomeexampleapplicationsofthisruleonvertexande
dgelabels:
Fgure10:Renamngproperesandabes
Theschemashowninthetopisasimplegraphschemashowingfamilyrelationships.T
his schemaistransformedtotheschemashowninthebottom ofthefigureusingthe
followingtransformations:
VertexlabelsManandWomanarerenamedtoMaleandFemale.
Edgelabelsmother(2instances),father(2instances)arerenamedtoparent.
Eventhoughitseemslikesomeinformationislostbyrenamingmother/fathertoparent
,this
isn'ttruebecausethevertexlabelsattheendpoints(Male/Female)havethatinformati
on.This
sametransformationwouldn'tbesoobviouswhilelookingataninstanceofthisgraphl
ikethe Kennedyfamilytree.
Notethatyoucannotrename'wife'to'parent'inthebottomschema.Thisisbecausether
e alreadyexistsaparentedgetypefromMaletoFemale.
RuleB:Reversingedgedirections
Thisrulestatesthatanedgetypecanbereversedprovideditisaselfloop,orthereisnoe
dge
typewiththesamelabelinthereversedirection.Thecardinalityoftheedgetypeisreve
rsedas well.
Fgure 1:Reverngedgedrecons
Thefollowingfigureillustratesanexampletransformationusingthisruleandtheprev
ious one.Thetransformationinvolvesthefollowingsteps:
The'wife'edgeisrenamedto'husband'(ruleA)andthenreversed.
Eachparentedgeisrenamedto'son'or'daughter'andreversed.
Notethatthereversalisdoneinthegraphinstanceaswellastheschema.Inotherwords
, JFKJrparent>JFKSr.becomesJFKSr.son>JFKSr.
Youcouldalwaysrenamethefour'son'and'daughter'edgetypes,to'child'usingruleA
. Again,noinformationislostsincethevertexlabelsarestillunique.
Youwould,however,notbeabletorename'husband'to'son'.Youcouldrename'husba
nd'
to'daughter'(thoughabsurd).Theapplicationwillhavetointerpret"maledaughters"
as
husbands.Butafteryourenamehusbandtodaughter,youwouldnotbeabletoreverseit
s direction.
Asyoucanseealready,someapplicationsoftheserulesmaybequitehardtoderiveify
ouare thinkingintermsofgraphinstances,ratherthangraphschemas.
RuleC:Propertydisplacement
Fgure12:Propertydsacement
Thisrulestatesthatapropertyonanedgetypecanbemovedtoeitheradjacentvertexty
pe,
provideditslookacrosscardinalityis1.Thereverserulestatesthatapropertyinavert
extype
canbemovedtoanadjacentedgetypewithlookacrosscardinalityof1,providedthee
dge alwaysexistswhenthepropertyexists.
Theadjoiningfigureclarifiestherule,wherethe'dateOfBirth'propertyismovedtoth
e
'mother'relationshipbecausethereisexactlyonemotherrelationshipperMan/Wom
anandit
isdefinedwhenthedateOfBirthisdefined.Ifyourename'dateOfBirth'to'deliveryDa
te',one couldarguethatthepropertybelongsintheedgeandnotthevertex.
NotethataMan'sdateOfBirthcannotbedisplacedtothewiferelationshipbecausetha
t
wouldmeanthatthedataOfBirthcannotbestoredunlessthepersonismarried.Simila
rly,
thedateOfBirthintheedgetypelabeled'mother'fromMantoWomaninthebottomsche
ma, cannotbemovedtoWomanbecauseofthecardinalityrestrictionsintherule.
Usingthisrule,youcanmovethepropertiesaroundtheschematocomeupwithabetter
lookingdesign.Thisruleisalsousefulinsatisfyingindexingrequirementsofvarious
graph
databases.Forexample,ifagraphdatabaseonlysupportsindexesonvertexpropertie
s,you
couldmovesearchablepropertiesfromtheedgestovertices.Similarly,ifagraphdata
base
supportsvertexcentricindexesbasedonpropertiesonadjacentedges/vertices,youc
anuse thisruletobringtheindexedpropertyclosertothevertextypeofinterest.
RuleD:Specializationandgeneralization
Thisrulestatesthat:
AnyvertextypecanbedividedintotwodisjointvertextypesbasedonaBooleanteston
thepropertiesandadjacentedgelabelsofavertexbelongingtothattype.
AnyedgetypecanbedividedintotwodisjointedgetypesbasedonaBooleanteston
thepropertiesandadjacentvertexlabelsofanedgebelongingtothattype.
Fgure13:Generizaon
Inotherwords,ifweprovideabooleanfunctionthatcangiveaT/Fresultgivenavertex
/edge, wecanusethatfunctiontodivideavertex/edgetypeintotwodifferenttypes.
Thereverserulestatesthat:
Anyvertex/edgetypecanbemergedintoanothervertex/edgetypeprovidedthereisa
Booleantestthatcandistinguishitsvertices/edgesfromthemergedvertices.
Theadjoiningfigureshowsanexampletransformationinvolvingthefollowingsteps
:
MaleandFemalearegeneralizedasPerson,becausethebooleantest,sexequals'M',
candistinguishMalefromFemale.
Afterthat,sonanddaughteredgetypesaregeneralizedaschildbecausethebooleante
st, sexofinvertexequals'M',candistinguishsonfromdaughter.
Thisruleisusefulinincreasingthespecificity,orreducingthecomplexityofthegraph
schema.
Asageneralprinciple,itisbettertousethisruleforspecialization,we.e.,increasingth
e
specificity,becausethatallowsthedifferentvertexandedgetypestoembracediffere
nt
behaviorintermsofpropertiesandadjacentedges.However,thereareinstanceswhe
rethe
differencesbetweenthevertextypesaresominorthatspecializationonlyresultsinap
plication
complexity.ThisargumentcouldapplytotheabovegeneralizationofMaleandFemal
eto Person.
RuleE:Edgepromotion
Fgure14:Edgepromoon
Thisrulestatesthatanedgetypecanbe promoted
toavertextypebyaddingtwo"out"edge
typestotheendpoints.Thepropertiesofthevertextypebecomepropertiesoftheedget
ype.
ThecardinalityofthenewedgetypesareN:1or1:1dependingonthelookacrosscardi
nalityof theoriginalendpointvertex'stype.
Notethatthedirectionofthenewedgetypescanbechangedusingontherenameand
reverserulesresp.Weonlymentionthe"out"directiontosimplifythewayinwhich
cardinalityforthenewedgestypesisderived.
Theadjoiningfigureshowsthehusbandedgepromotedtoavertextypecalled'Marria
ge'. Theedgetypes'husband'and'wife'pointtothetwoendpointsofthevertextype.
TheedgepromotionruleisusefulinapreparingbinaryrelationshiptobecomeanNary
relationship.
Thereverserulestatesthatanyvertextypewithtwopropertylessedgetypes,withsam
eside cardinalityofexactly1,canbedemoted
toanedgebetweentheadjacentvertices.Thisprocess
isusefultosimplifyschemas.Youcanusethepropertydisplacementrule(ruleC)tom
ove propertiesoutofedges.
RuleF:Propertypromotion
Fgure15:Propertypromoon
Thisrulestatesthatanygroupofpropertiescanbepromotedtoanewvertextypewithth
ose
properties,providedthenewvertextypehasedgesconnectingittoallexistingvertext
ypesthat
includethepropertygroup.Thesamesidecardinalityofthenewedgetypeis1.
Theadjoiningfigureshowsthe'sex'propertyconvertedtoanewvertextype.Thisvert
extype
willhaveexactlytwonodescorrespondingtomaleandfemale.Soinotherwords,eve
ryperson
inthenewgraphwillhaveanoutgoing'isa'edgetooneofthetwonewvertices.
Thisruleisequivalenttothesplittingofarelationintotworelations,asshowninthefirs
t
figureofthissection.Anygroupofproperties,typicallyonesthatrepeat,canbepromo
tedtoa vertex.
Whileapplyingthisrule,itisbettertoincludeallvertextypesthathavethesamegroupo
f
properties.Forexample,ifthereisa'sex'propertyinadifferentAnimaltype,itisbetter
to
pointthattothenewSexvertextypeaswell.Ifyouhaveedgetypeswiththepropertygro
up, youcanfirstpromotethoseedgetypestovertices.
Thereverseofthisruleisthatavertextypethathaspropertylessedgetypeswithsamesi
de
cardinalityof1,canbedemotedtothegroupofpropertiesthatitholds.Thesepropertie
smust
beaddedtoeveryadjacentvertextype.Thisistheequivalentofthedenormalizationof
atable
intherelationalmodel,whichisusefultoreducethenumberofjoins(ortraversalsinth
ecase ofgraphdatabases).
RuleG:Propertyexpansion
Fgure16:Propertyexpanson
Thisrulestatesthatapropertyofavertextypethatrepresentsalistofvaluescanbemov
ed
toaseparatevertextypewhichstoreseachvalue.Thenewvertextypemusthavean"in
"
edgetypefromtheexistingvertextypewithcardinality1:N.Theadjoiningfiguresho
wsthis ruleappliedtothenicknamepropertywhichholdsalistofStrings.
Thereverserulestatesthatanyvertextypewithexactlyonepropertylessedgetypewit
h
lookacrosscardinalityofexactly1canberemovedaftermovingitspropertiestoalisti
nthe adjacentvertextype.
Thisistheequivalentof1NFintherelationalmodel.Unlikerelationaldatabases,how
ever,many
graphdatabasessupportlistsasavalidtypeforpropertyvalues.Sothechoiceofstorin
g nicknamesasaListoraseparatevertextypeisuptothedesigner.
Summary
Rulebasedschematransformationsaretoolsthatadatamodeldesignercanusetorew
ritea
graphschema,withoutlosinganyinformationintheprocess.Inotherwords,adatamo
del
designercanusetheserulestoselectthedirectionsofedges,thenamesofdifferentlabe
ls
andkeys,thelocationsofvariousproperties,andsoon.Thesechangesdon'tmatterfro
man
pureinformationperspective,butcouldmakeabigdifferenceintheusabilityandeffic
iency.
Inthatsense,adatamodeldesignercangobacktoCodd'soriginalgoalsfornormalizat
ion designingschemasthatareeasytomodify,easytoextend,informativetousersand
supportiveofvariousquerypatterns.
Chapter5
Theprevioussectionlistedsevenrulebasedschematransformationssuchasrenamin
glabels,
reversingedges,promotingedgesandpropertiestovertices,andsoon.Suchrulebase
d transformationscanbemechanicallyappliedto
anygraphschema,withoutlosingany
informationintheprocess.Usingtheserules,agraphdatabasedesignercanstartwith
a designgeneratedfromanentityrelationshipmodelandtweakittogetafinaldesign.
Fgure17:Exampegraphschema
Theabovefigureshowsanexamplegraphschemadescribingconstraintsonthegraph
data modelsuchas:
Whatarethelegallabelsforvertices?
Whatarethelegaledgelabelsbetweentwovertextypes?
Whatarethelegalpropertykeysandvaluetypesateachedgeorvertextype?
Thereality,however,isthatagraphmodelcouldhaveotherconstraintsthataren'texpr
essed
intheschema.Forexample,the'inviter'edgeineveryInvitationmustbetotheUserwh
ohas
an'owns'edgetothe'page'edgeoftheInvitation.Thisconstraintisn'tcapturedintheab
ove schema.
Thequestionis:Howcanwemodelcompexconstrntsnagraphmode?
Graphuniverses,transformationsandequivalence
Agraphunverse
Uisasetofgraphs,typicallyaninfiniteset.Agraphuniverserepresentsa
datamodelinthesensethatitcaptureseveryvalidgraphthatbelongstothedatamodel.
Redenngequvenceusngtransformaonfuncons
Fgure18:Annverbefuncon
AgraphtransformationTisafunctionthattakesgraphsfromoneuniverseUto
anotheruniverseV.Inshort,T:U→ V.
AuniverseUisequivalenttoauniverseVifthereisatransformationfunctionT:U →
V,
whereTisinvertible.Invertibleandbijectivearetermstocharacterizefunctionsthat
establishaonetoonecorrespondencebetweentwosets,whichinthiscasearegraph
universes.
Inotherwords,givenanygraphG∈U,wecanuseT(G)togetagraphG'∈V.Thenwecan
usethe inversefunctionT1
(G')togetbackG.Henceestablishingequivalenceofthetwouniverses.
Aprogrammngperspecve
Ifweareupgradingfrom onegraphmodeltoanother,thetransformationfunctionisthe
upgradescp thatwewouldimplementtomovetothenewmodel.Ifwecanalsowritea
downgradescp
,thenwehavetwoequivalentmodels(oruniverses).Inotherwords,two
graphmodels,representedasuniversesorschemas,areequivalentiftheyareforwar
dand backwardcompabe .
Derivedtypes
ConsideragraphuniverseUthatiscompatiblewithaschemaS.AvertextypeinScanb
ecalleda devedvertextypenU
,ifeverygraphG∈Uissuchthatitsvertices(andadjacentedges)belongingto
thevertextypecanbecalculatedfromtherestofthegraph.
Inotherwords,givenanygraphintheuniverseU,afterweremoveallverticescorresp
ondingtothe
derivedvertextype,thereshouldbeawaytocalculatethoseverticesagain.Derivede
dgeand
propertytypescanbedefinedsimilarly.Notethatallderivedelementtypesaredefine
dingraph
schemas,butarespecifictographuniversesthatarecompatiblewiththatschema.
Metarule:Addingandremovingderivedtypes
Finally,hereisthemetarulebehindallschematransformations:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanaddaderivedvertex/edge/propertytypetoproduce
an equivalentgraphuniverseVcompatiblewiththeschemaS∪{derivedtype}.
Thereverserulestatesthat:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanremoveaderived
vertex/edge/propertytypetoproduceanequivalentgraphuniverseVcompatible
withtheschemaS{derivedtype}.
Fgure19:Modfedgraphschema
The'invitee'edgetypeinthegraphschemashowninthefirstfigureisaderivededgetyp
e. Thisisbecausethe'invitee'edgescanbecalculatedbygoingfrom
theInvitationverticesto
thePageandbacktotheuserthrough'owns'edge(reversedirection).Wecansimplifyt
he
originalschematotheversionshownintheadjoiningfigurebyapplyingtheserules:
(Metarule)Removederivededgetype'invitee'
(Edgepromotion)DemotethebinaryrelationshipInvitationtoanedgecalled'invited
'.
Asyoucansee,theupdatedschemaissimplerthantheoriginalschemaderivedfrom
anER diagram.
Provingthemetarule
Themetaruleiseasyto provebecauseofthewayderivedtypesaredefined.The
transformationfunctiontoremoveaderivedtypesimplyremovesallelementsthatbel
ong tothattype.Theinversefunctioncalculatesthederivedtypesfrom
theremaininggraph.
Hencetheuniversewiththederivedtypeisequivalenttotheuniversewithoutit.
Provingthe7rules:Renaming,Reversing,PropertyDisplacement,...
Considertheexampleofrenaminganedgetype.Thisrulewasstatedinthelastsectiona
s:
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
edgelabelbetweentheoutandinvertextypes.
Wecanprovethisintwosteps:
Addderivededgetypewiththenewnameasacopyoftheoldedgetype.
Removetheoldedgetypewhichisnowderivablefromthenewedgetype.
Ofcourse,step1requiresthattheedgetypewiththenewnamedoesn'talreadyexistinth
e
schema.Otherwise,alledgesoftheedgetypecan'tbederived.Hencethecondition"a
slong asthenewnamedoesn'trefertoanexistingedgelabel."
Inthismanner,wecanproveeachrulebyperformingsomestepstofirstaddnewderive
d typesandthenremovetheexistingtypeswhichbecomederivedtypesthemselves.
Beyondtransformationrules
Thinkingintermsofgraphuniverses,derivedtypesandtransformationfunctionslets
usdo
moreradicaltransformationstoourgraphmodel.Themannerwithwhichweapplyth
eserules ortransformationsdependsonouroverallstrategyfordatamodeling.
Onestrategyistominimizethenumberofimplicitconstraintsnotcapturedbythesche
ma.For
instance,theschemashowninthesecondfiguredoesn'thavetheimplicitconstrainton
the'invitee'
edgetypeshowninthefirstfigure.Generally,fewerimplicitconstraintsmeanslessdu
plicationof
dataandlesschanceofbugswhileupdatingthedatabase.Thisissimilartonormalizati
onin relationaldatabases.
Adifferentstrategyistotunethegraphforitsspecificqueryingneeds.Suchapproache
shave
beenpopularizedby"denormalization"techniquessuchasdimensionalmodeling.F
orinstance, wecouldadda"shortcut"derivededgetypecalled'latest'from
UsertoPagetoshowthelast
createdpageforeachuser.Theimportantthingthenistoensurethatanychangetotheres
tof
thegraphisaccuratelyreflectedinthederivedelementtypes.Thecodethatoperateso
nthe graphmustbedesignedwiththeseconstraintsinmind.
Summary
Thissectionintroducedsettheoreticrepresentationsofgraphmodelscalledgraphun
iverses,
whicharemorepowerfulthangraphschemas.Secondly,thissectionshowedthattwo
graph
universesareequivalentifthereisaninvertiblegraphtransformationfunctionbetwe
enthem.
Finally,thissectionshowedthatallschematransformationrulespresentedintheearli
er
sectioncanbederivedfromonemetarulethatdealswithaddingandremovingderived
types.
Validatinggraphschemas
Thelastfew sectionshavediscussedhow
propertygraphschemascanhelpdesigngraph databasesfrom
ERmodelsandrefinethedatamodelthroughschemamanipulations.After
readingthisthreadontheGremlinusersgroup,werealizedthatitiseasytovalidategra
phs againstschemaswithGremlinandGroovy.
Fgure20:Tnkergraphschema
ThisgistonGithubshowshowyoucantakeaninstancegraphandchecktoseeifitiscom
patible
withaschemagraph.Theschemagraphhasverticesandedgescorrespondingtoverte
xandedge
types.Here'sthecodetocreateaschemagraphinsideaGremlinshellfortheclassicTi
nkerpop schemashownhere:
sg=newTinkerGraph()
person=sg.addVertex()
person.setProperty('_label','person')
person.setProperty('name','java.lang.String')
person.setProperty('age','java.lang.Integer')
software=sg.addVertex()
software.setProperty('_label','software')
software.setProperty('name','java.lang.String')
software.setProperty('lang','java.lang.String')
knows=person.addEdge('knows',person)
knows.setProperty('weight','java.lang.Float')
created=person.addEdge('created',software)
created.setProperty('weight','java.lang.Float')
created.setProperty('_minIn',1)//Someonemustcreatethesoftware
ThepropertieshavevaluescorrespondingtotheJavaClassofthepropertyvaluesinth
e
instancegraph.Thepropertykeyscanendwith'?'toindicatethepropertyisoptional.T
he
edgesintheschemagraphcanhave4specialproperties,viz._minIn,_maxIn,_minOu
tand _maxOuttoindicatecardinalityrestrictionsforvariousedgetypes.
Anyinstancegraph,g,canbevalidatedagainsttheschemastoredinsg,usingtheGreml
in script:
g.V.filter({checkVertex(it,sg)})
YoucanlookatthefullGithubgisttoseehowthevalidationisdone.
ThecurrentversionofTinkerpopdoesn'tsupportvertexlabels.Sothemappingfrom
the vertextothevertextypeisspecifictothegraph,likethis:
vertexType={v,sg>.age?
sg.V('_label','person').next():sg.V('_label','software').next()}
Mostgraphschemastypicallyhaveapropertynamed'type'thatwouldmakethismapp
ing easier.
HoweverwithTinkerpop3,thismethodcanbestandardizedtousethelabel:
vertexType={v,sg>sg.V('label',v.label).next()}
Pixy:Firstorderlogicongraphdatabases
TheprevioussectionshaveshownthatanyERmodelcanbeconvertedtoapropertygr
aph
schema,andthattheschemacanbenormalizedusingrules.However,onekeyquestio
n remains:
Dographdatabasesofferthesamequerngcapalitesasraonaldatabases?
Inotherwords,anydatathatfitsinanERmodelcanbestuffedintoagraphdatabase.But
candatastuffedinthisfashionbequeriedeffectively?
Thisisthesubjectofthissection.
Background
OnSQL
SQListhequerystandardforrelationaldatabases.Itfirstappearedinthe1970sandw
as
standardizedinthe80sand90s.ThetheoreticalfoundationofSQLisrelationalalgebr
a.Codd
showedthatrelationalalgebraisequivalenttorelationalcalculus,aformoffirstorde
rlogic.His theoremisthebedrockofSQL'sexpressivepower.
Onfirstorderlogic
Usingrelationalalgebra,wecanwriteanyqueryoftheform"Findallrowsfromtables
A,B,C,..., matching somepredcat
",aslongasthepredicatecanbeexpressedinfirstorderlogic.
Specifically,thepredicateisformedusing:
variouscomparisonsonrowsandcolumns, logicaloperations"and"(∧),"or"
(∨)and"not"(¬),and
theuniversal"forevery"(∀)andexistential"thereexists"quantifiers(∃)thatop erateonrowsofagiventable.
Let'sconsidertablesnamedperson,carandticket.Wecouldexpressaquerylike"find
me peoplewhoownonlyBMW
cars,buthaveatleastonespeedingticket".Thepredicatecanbe writtenas:
my_query(person)=
(∀car,personownscar∧car.make='BMW')∧(∃ticket,personhasticket)
OnGremlin
Gremlinisastandardgraphtraversallanguage.ItispartoftheTinkerpopstackandwo
rks
acrossallBlueprintscompatibledatabases.YoucanreadmoreaboutGremlinhere:
GremlinWikionGithub
GremlinDocs
ThePathologicalGremlin(presentation)
Gremlinisgreatforstepbasedqueries.Fore.g.,somethinglike"findthefriendofafrie
ndof
vertexv"canbewrittenasv.out('friend').out('friend').Thisstyleoftraversalwithver
ticesand edgesisn'tnaturalinSQLwithtuples.
ThedeclarativequeryingstyleofSQLis,however,differentfrom
Gremlin.TheSQL2Gremlin
tutorialgoesthroughsomeexamples.Butyoucanseethatthetranslationisn'tobvious.
Pixy:FirstorderlogicwithGremlin
Pixyisabridgefrom
firstorderlogictoGremlin.ThefirstorderlogicofPixyoperateson
verticesandedges.Wecanaskquestionslike"Findverticesandedgesthatmatchsom
e precat "wherethepredicateisformedby
variouscomparisonsonvertexandedgeproperties, logicaloperations"and"
(∧),"or"(∨)and"not"(¬),and
theuniversal"forevery"
(∀)andexistential"thereexists"quantifiers(∃)thatoperateon verticesandedges.
PixyqueriesareexpressedusingPrologrules,notSQL.RulesinPrologareexpressed
asHorn clauses.
ProloglikeSQLhasthefullexpressivepoweroffirstorderlogic.
Let'stakethepredicatefromtheearlierdiscussion,
my_query(person)=
(∀car,personownscar∧car.make='BMW')∧(∃ticket,personhasticket)
Let'ssaythatwerepresentpeopleasverticeswithoutgoingedgetypesnamed'car'and
'ticket'
toverticesrepresentingcarsandtickets.Now,wecouldexpresstheabovepredicateu
sing Hornclausesasfollows:
my_query(Person,Ticket):out(Person,'ticket',Ticket),
not(not_all_bmw(Person)).
not_all_bmw(Person):out(Person,'car',Car),
property(Car,'make',Make),
Make<>'BMW'.
NotethatoutandpropertyarepredefinedpredicatesinPixy.Youcanseethatthe
∃partofthequeryiseasy.Thisisamatteroffindinga
ticket.The∀partofthequeryisimplementedusingtwonots.Inotherwords,saying"everycarisaBMW"isthesamea
ssaying"thereisno carthatisn'taBMW".
ERmodelsinPixy
IfyouuseanERmodelasastartingpointforyourdesign,youcanreconstitutetheERmo
del from
thefinalgraphschemausingPixy.ConsiderthepreviouslyreferencedERmodelwith
entitiesnamedUser,PageandTagandrelationshipsnamedOwns,InvitesandTagged
As.
Fgure21:ERmodelforUserPageTagappcaon
Thiswastranslatedtoagraphschemawithfourtypesofvertices,viz.User,Page,Taga
nd Invitation.
Fgure
2:GraphschemaforUserPageTagappcaon
Now,wecanreconstitutetheERmodelfrom
thegraphschemausingPixywiththefollowing clauses:
%Entities
user(User,Name,Login):property(User,'name',Name),property(User,'login',Log
in). page(Page,Uri,Html,CreateTs):property(Page,'uri',Uri),...
tag(Tag,Hashtag,Description):property(Tag,'hashtag',Hashtag),...
%Relationships
owns(User,Page):out(User,'owns',Page).
taggedAs(Page,Tag):out(Page,'taggedas',Tag).
invites(Invitation,Inviter,Invitee,Page):
out(Invitation,'invitee',Invitee),
out(Invitation,'inviter',Inviter),
out(Invitation,'page',Page).
Everypredicatecorrespondstoanentityorarelationship.Thepredicateoperateson
vertices,
edgesandpropertiesthatbelongtothegraphschema.Now,yougetthefullpowerof
firstorderlogicontheERmodel.Inotherwords,anyfirstorderpredicatethatappliest
o entitiesandrelationshipscanbewrittenasaPixyquerythatusestheaboveclauses.
Let'stakeanexamplepredicatethatmatchesallusersinvitedtopagestagged'tinkerpo
p' createdin2014.Youcouldexpressthisasfollows:
tinkerpop_invitee(User,Page):invites(_,_,User,Page),
page(Page,_,_,CreateTs),
CreateTs>1388534400L,%Unixtimestampfor1/1/2014
taggedAs(Page,Tag),
tag(Tag,'tinkerpop').
Notethat'_'isusedtorepresentanonymousvariables.
Queryrequirementsdon'tusuallymatterwhilemodeling
Itisn'tsurprisingthatqueriesinfirstorderlogiccanbecompiledtoGremlin,sinceGre
mlinis
Turingcomplete.ThesurprisingthingisthatPixyconvertsanyfirstorderlogicqueryo
nan
ERmodeltosomethingthatexecutes"efficiently"onthecorrespondinggraphdataba
se.
By"efficiently",wemeanthatthePixy/Gremlinquerywillalwaystraverseedgestog
ofrom
oneentity/relationshiptoanother.Edgetraversaloperationsingraphdatabasesare
typicallyordersofmagnitudefasterthanindexbasedjoinsinrelationaldatabases.
Queriesonproperties,willofcourse,needindexesforefficientquerying.Butaslong
asyour
startingERmodelisaccurate,yourapplicationwillnothavetosimulatejoinsusingth
ese
propertyindexes.Inthatsense,thegraphschemadesignisindependentofthequery
requirements.
PartII
Chapter8:IntroductiontoDatabase
DatabaseSystemsevolution:Databasesanddatabasetechnologyarevitalto
modernorganizationssupportingboththedailyoperationsanddecisionmaking.
Databasetechnologyhasundergoneremarkableevolutionover50years.Despite
dominancetotheenterpriseDBMSmarketplacebyOracle,theindustryremains
highlycompetitivewithacontinuedhighlevelofinnovation[12].
Figure1:Evolutionofdatabasetechnology
Majorperiodsofdatabasetechnologyevolution[12]:
1stGeneration(1960’s):Fileoriented–Supportedsequentialandrandom
searchingoffiles,buttheuserwasrequiredtowritecomputerprogramsto
accessdata.Thedatabasesoftwareindustryhadlittleornostandards
duringthisperiod.
2ndGeneration(1970’s):Navigational–Couldmanagemultipleentitytypes
andrelationships.Computerprogram stillhastobewritten.Progresson standards.
3rdGeneration(1980’s):Relationalwithnon-proceduralaccess–Foundation
based on mathematical relations and associated operators. Optimization
technology was developed.IBM performed pioneering researchtoenablecom-
mercializationofrelationaldatabasetechnology.
4thGeneration(1990’s+):Objectoriented–Areextendingthebound-aries
ofdatabasetechnology.New kindsofdistributedprocessinganddata
warehouseprocessing.Canstoreandmanipulateunconventionaldata
types.ConvenientwaystopublishstaticanddynamicWebdata.
DBMSmarketplace:DespitedominancetotheenterpriseDBMSmarketplaceby
Oracle,withmorethan40% overallmarketshare,theindustryremainshighly
competitivewithacontinuedhighlevelofinnovation.Insomeenvironments,its
competitionisMicrosoftSQLServer,IBM DB2,Teradata,SAP Sybase.Open
sourceDBMSproductshavebeguntochallengethecommercialDBMSproducts
atthelow-endoftheenterpriseDBMSmarketplace.Thecategoryofopen-source
DBMSisleadedbyMySQL,followedbyMongoDB,PostgreSQLandMariaDB.Int
he desktopDBMSmarket,MicrosoftAccessdominatesbecauseofthedominanceof
MicrosoftOffice.
Figure2:DBMSmarketplace
Innovationintheindustry:TheadvancesinDBMSinrecentyearssupportbusiness
intelligenceprocessingfordataintegrationandusageofsummarydata.NoSQL
technologyhasbeendevelopedtosupporttheneedsofBigData,tobemodernwebsca
le databases.Since 2009,the mostaccepted definition ofNoSQL isnext
generationdatabasesbeingnon-relational,distributed,open-sourceandhorizon-
tally scalable.Othercharacteristicsthatusuallyapplyareschema-
free,scalability,global
availability,easyreplicationsupport,simpleAPI,eventuallyconsistent/BASE(not
ACID),andlargescaledata.[5][19]
TypesofDBMS
RankingInthissectionweobserverankingscreatedbyDB-Engines.DB-Enginesis
aninitiativethatprovidesinformationonthepopularityoftheDBMSavailablein
themarket.TheymakeavailabledifferentrankingsforeveryDBMStype,whichare
updatedmonthly.
Figure3:DBMSdevelopedbydatabasemodelpiechart
Overthoselines,apiechartrepresentsthecategoriesofDBMSthatcomprisemore
systemsdeveloped.ThedatabasemodelmoreelaborateistheRelationalDBMS,wh
ere 137systemsfallunderthiscategory.ItisfollowedbyKey-
valuestores,with63systems,
Documentstores,with43systems,andGraphDBMS,with27systems.
Intheoverallclassificationofdatabasemodels,thoseDBMStypesaredistinguished
. TypesofDBMS:
RelationalDBMS GraphDBMS
Key-valuestores TimeSeriesDBMS
Documentstores RDFstores
ObjectorientedDBMS(Atkinson) NativeXMLDBMS
Searchengines Contentstores
MultivalueDBMS EventStores
Widecolumnstores NavigationalDBMS
Abovetheselines,the14moredevelopeddatabasemodelshavebeenlisted.If
insteadofcountingthesystemsdeveloped,thedatabasemodelsarerankedbypop
-ularity,thelistofmodelstobeconsideredshrinks.Mostoftheusersworkon
relationalDBMS,the79.5%,followedbydocumentstores,7.3%,searchengines,
4.3%,key-valuestores,3.5%,widecolumnstores,3.1%,andgraphDBMS,1.1%.
Belowtheselinesapiechartrepresentsthemostrecentpopularityrank.
Figure4:DBMSpopularitybydatabasemodelpiechart
Inthepiechartabove,itiscleartoseethatRelationalDBMSaretheonesusedby
default.However,thestateoftheartischangingbytheinnovationsinthe
databasetechnology.Even thoughthepercentagesofpopularityofNoSQL
databasesareminimalcomparedtoRelationalDBMS,thefactthattheyarerecent
technologiesingrowthisenoughtoevaluatethemmoredeeply.
NoSQLDBMS
DocumentStores:Therecordsstoredarecalleddocuments,whichconsist
ofgroupingofkey-valuepairs.Valuescanbenestedtoarbitrarydepths.
[18]Examples:Elastic,MongoDB,AzureDocumentDB
WideColumnStores:WhileRDBMSstoreallthedatainaparticulartable’s
rowstogetheron-
disk,beingabletoretrieveaparticularrowfast,Columnfamilydatabasesareabletor
etrievealargeamountofaspecificat-tribute
fastbyserializingallthevaluesofaparticularcolumntogetheron-disk. This
approach is useful for aggregate queries. [18] Examples:
Hadoop/HBase,Cassandra,AmazonSimpleDB
GraphDatabases:idealatdealingwithinterconnecteddata.Theirstruc-ture
consistofconnections,oredges,betweennodes.Bothnodesandtheiredges
canstoreadditionalpropertiessuchaskey-valuepairs.Thestrengthofa
graphdatabaseisintraversingtheconnectionsbetweenthenodes.Their
downsideisthattheygenerallyrequirealldatatofitononemachine,limiting
theirscalability.[18]Examples:Neo4J,InfiniteGraph,TITAN
Atomicity:Alloperationsinatransactionsucceedoreveryoperationis rolledback.
Consistent:On the completion ofa transaction,the database is
structurallysound.
Isolated:Transactionsdonotcontendwithoneanother.Contentiousaccesstodatais
moderatedbythedatabasesothattransactionsappearto runsequentially.
Durable:Theresultsofapplyingatransactionarepermanent,eveninthe
presenceoffailures.
However,NoSQLdatabasesbreakwiththetopicalityofSQLmodelswithACID
properties.BASEpropertiesseem toadequatebettertomostNoSQLdatabases,
andtheyareasfollows:
BasicAvailability:hedatabaseappearstoworkmostofthetime.
Soft-state:Storesdon’thaveto bewrite-consistent,nordodifferent
replicashavetobemutuallyconsistentallthetime.
Eventualconsistency:Storesexhibitconsistencyatsomelaterpoint(e.g.,
lazilyatreadtime).
ACIDtransactionscanbeconsideredstricterthanneededformanyNoSQLcases,
astheyapplymanyconstraintsforsafetysake.Ontheotherhand,BASE
transactionsguaranteesscale and resilience.The BASE modelisused by
aggregatestores,suchascolumnfamily,key-valueanddocumentstores.In
contrast,graph databases use the ACID model.BASE databases promise
availabilityofthedataattheexpenseofdataconsistency(theconsistencyofthe
dataisonlyassuredatconcretesnapshots).[16]Graphdatabasesdifferentiate
themselvesfrom otherNoSQLdatabasesbyfocusingmoreondataconsistency.
Thecomparisonmadeinthelinesaboveisshowninatablebelow:
ACID
RelationalDBMSclearlyarethebenchmarkamongdatabasesystems.Themass
adoptionofthisDBMStypeisanimportantfactorforchoosingitasthemainsystem
inmanycompanies.However,currenttrendsshowthatthefourmaintimesofNoSQL
databasesshouldalsobetakenintoaccountbeforeinstallingaDBMS.Tohavea
moreobjectivepointofviewofthebenefitsofusingeachmodel,theusecasesfor
whichtheyperform betterandtheonesforwhichtheyperform theworst,are
listedbelow.
Usecasesforrelationaldatabases
Positiveusecases:transaction-orienteddatabases(bankingapplications, on-
linereservations),wheretheconcurrencyofmanytransactionsmust besup-
portedandtheintegrityofthedatamustbeprotected.
Negativeusecases:datawarehouses,whichareanalytically-oriented
databaseswithalargeamountofdataandinfrequentupdates.The
constraintsoftherelationaldatabasewouldn’tsupportthescalability.
Usecasesforkey-valuestores
Positiveusecases:
–Forstoringusersessiondata
–Maintainingschema-lessuserprofiles
–Storinguserpreferences
–Storingshoppingcartdata
Negativeusecases:
–Toquerythedatabasebyspecificdatavalue
–Withrelationshipsbetweendatavalues
–Tooperateonmultipleuniquekeys
–Ifthebusinessneedsupdatingapartofthevaluefrequently
Usecasesfordocumentstores
Positiveusecases:
–E-commerceplatforms
–Contentmanagementsystems
–Analyticsplatforms
–Bloggingplatforms
Negativeusecases:
–Toruncomplexsearchqueries
–Applicationrequirescomplexmultipleoperationtransactions
Usecasesforwide-columnstores
Positiveusecases:
–Contentmanagementsystems
–Bloggingplatforms
–Systemsthatmaintaincounters
–Servicesthathaveexpiringusage
–Systemsthatrequireheavywriterequests(likelogaggregators)
Negativeusecases:
–Tousecomplexquerying
–Ifthequerypatternschangefrequently
–Withoutanestablisheddatabaserequirement
Usecasesforgraphdatabases[19]
Positiveusecases:
–Frauddetection
–Graphbasedsearch
–NetworkandIToperations
–Socialnetworks
Negativeusecases:
–DataWarehousessobigthatrequireBASEmodel
Figure6:PositionsofNoSQLdatabases(source:Neo4j)
Onthefigureabove,thefivetypesofDBMSthatwerebeingcompared,aredisplayeda
ccordingtothesizeandcomplexityoftheirdatabases.Itcanbe
concludedthateachoneofthoseDBMSworksforsomespecificusecases,
dependingontheamountandcomplexityofthedatathatisgoingtobestored.
Theirusecasesarenotoverlapped,whichjustifiesthatthefifthofthem must
beconsideredbeforeimplementingaDBMSinacompany.
Chapter9:GraphDatabases
Graphdatabasesaredatabaseswhosespecificpurposeisthestorageofgraphoriente
ddatastructures,thereforeanintroductiontographtheorytobeconsistentwhenusingi
tsterminology.
ConceptsofGraphDatabases
PositioningIthaspreviouslybeenexplainedthatNoSQLdatabasesaddresssev-eral
issuesthatrelationaldatabasesdonot:availabilityfortheprocessingoflarge
datasets,partitioning,flexibilityoftheschemaandmodellingandprocessingcomple
xstructuresliketrees,graphs,specializedinprocessinghighlyconnecteddata,
managingcomplexandflexi-bledatamodelsandimprovingtheperformanceof
complexqueriesbytraversingthegraph.
ModelAnotherqualityofgraphdatabasesisthesimplicityofitsmodel.Inthe
figuresbelow,itcanbeappreciatedthedifferenceinmodelingthesameusecase
inarelationaldatabaseoragraphdatabase.Themodelofthegraphdatabaseis
moresimilartothebusinessmodel,whichmakesitmoreaccessibletonottechnicalpr
ofiles.[8]
(a)RelationalDatabaseModel (b)GraphDatabaseModel
Figure7:ModelComparison
Agraphisapictorialrepresentationofobjectswhichareconnectedbysome
pairoflinks.Agraphcontainstwoelements:Nodes(vertices)and
relationships(edges).
WhatisGraphdatabase
Agraphdatabaseisadatabasewhichisusedtomodelthedataintheform
ofgraph.Itstoreanykindofdatausing:
Nodes
Relationships
Properties
Nodes:Nodesaretherecords/dataingraphdatabases.Dataisstoredasproperties
andpropertiesaresimplename/valuepairs.
NodescanbegroupedtogetherbyapplyingaLabeltoeachmember.Anodecan
havezeroormorelabels.Labelsdonothaveanyproperties.StoringdatainNeo4jis
similartoaddmorerecordsinotherdatabases.
Relationships:Itisusedtoconnectnodes.Itspecifieshowthenodesarerelated.
Relationshipsalwayshavedirection. Relationshipsalwayshaveatype.
Relationshipsformpatternsofdata.
Properties:Propertiesarenameddatavalues.
PopularGraphDatabases
Neo4jisthemostpopularGraphDatabase.OtherGraphDatabasesare
OracleNoSQLDatabase OrientDB
HypherGraphDB
GraphBase
InfiniteGraph
AllegroGraphetc.
WhyGraphDB
Graphdatabaseisveryusefulnowadaybecauseingraphdatabasesdataexistin
theformoftherelationshipbetweendifferentobjects.Therelationshipbetweenthe
dataismorevaluablethanthedataitself.
Relationaldatabasesstorehighlystructureddatawhichhaveseveralrecordsstoring
thesametypeofdatasotheycanbeusedtostorestructureddataand,theydonot
storetherelationshipsbetweenthedatawhilegraphdatabasesstorerelationships
andconnectionsasfirst-classentities.
Thedatamodelforgraphdatabasesissimplecomparedtootherdatabasesand,
theycanbeusedwithOLTPsystems.Theyprovidefeaturesliketransactionalintegrit
yand operationalavailability.
GraphDBvsNoSQLDatabase
FollowingaresomepointswhichspecifywhyGraphDbisbetterthanotherNoSQLda
tabases:
MostNoSQLdatabasesstoresetsofdisconnectedaggregates.Thismakesit
difficulttousethemforconnecteddataandgraphs.
Onewell-knownstrategyforaddingrelationshipstosuchstoresistoembedan
aggregate'sidentifierinsidethefieldbelongingtoanotheraggregate-effectively
introducingforeignkeys.
Butthisrequiresjoiningaggregatesattheapplicationlevel,whichquicklybecomes
prohibitivelyexpensive.
Seetheusecasesofdifferenttypeofdatabases:
Relationaldatabase:Itisrepresentedintabularformsoitisbestforcalculatingthe
income.
Key-ValueStore:Itisbestforbuildingashoppingcart.
NoSQLdatabases:Itisstoredasadocumentso,itisbestforstoringstructured
productinformation.
GraphDB:Itfollowsagraphstructure.Itisbestfordescribinghowausergotfrom
pointAtopointB.
Neo4jDataModel
Neo4jDatabasefollowsthePropertyGraphModelforstoringandmanagingitsdata.
Neo4jisagraph
databasewhichcontainsthefollowingfeaturesofPropertyGraphModel.
TheGraphmodelcontainsNodes,RelationshipsandPropertieswhichspecifiesdat
aand itsoperation.
Propertiesarekey-valuepairs.
NodesarerepresentedusingcircleandRelationshipsarerepresentedusingarrowke
ys. Relationshipspecifiestherelationbetweentwonodes.
Therearetwotypesofrelationshipsbetweennodesaccordingtotheirdirections:
UnidirectionalandBidirectional
EachRelationshipcontainstwonodes:"StartNode"or"FromNode"and"ToNode"
or "EndNode".
BothNodesandRelationshipscontainproperties.
RelationshipsshouldbedirectionalinPropertyGraphDataMode.Ifyoucreatea
relationshipwithoutadirection,itwillthroughanerrormessage.
TherearethreemainbuildingblockofaGraphDBDatamodel:
Nodes
Relationship Properties
FollowingisasimpleexampleofaPropertyGraph.
Figure8:SimpleGraph
Here,wehaverepresentedNodesusingCircles.Relationshipsarerepresentedusin
gArrows.
Relationshipsaredirectional.WecanrepresentNode'sdataintermsofProperties(k
ey-valuepairs).In
thisexample,wehaverepresentedeachNode'sIdpropertywithintheNode'sCircle.
Queryperformance
GraphdatabasescompetitiveadvantageIthasbeensaidthatgraphdatabaseshavea
reasontobebecausetheyoutperform relationaldatabasesincomplexqueries.They
areparticularlygoodwhentherelationshipsbetweenitemsaresignificant.Theuse
casethatisbettersuited forgraph databasesis"find allentitiesofa kind"
(myEntity.findAll).Theexecutionofsuchaquery,startswithanindexlookuptofind
thestartingnode(s)fortraversal.Thentherelationshipsinthegrapharetra-versed
simultaneously.Becauseoftheconcurrenceofthetraversal,thebiggerthevolumeof
data,themoreitoutperformsrelationaldatabases.
Figure9:Queryexecutioningraphdatabases
Relationaldatabasesarelessadequatetoquerythroughrelationships.Itwouldmean
queryingthroughdifferenttables,followingforeignkeysandotherindexes,anditwo
uld
considerablyincrementtheperformancetime.Graphdatabasestraversalsareperfo
rmed byfollowingphysicalpointers,whileforeignkeysarelogicalpointers.
[8]Thequeryinthe figure,includesthetimeofeachindex-
scan.Themoretablesareincludedinthequery,the
largertheexecutiontimewillbecome.
Figure10:Queryexecutioninrelationaldatabases
RelationalDatabasescompetitiveadvantageOntheotherhand,becauseofthe
internalstructureofthetables,relationaldatabaseswouldoutperform graph
databaseswhentheoutputrequiresalltheattributesofatable(findAll-like
queries).Itsidealusecaseistoaggregateoveracompletedataset.[8]
GraphdatabasesrankingBelow thoselines,thefigureshowstheDB-Engines
RankingonGraphDBMS.Neo4jleadstheranking,anditsscoretriplesthe
followingDBMS,MicrosoftAzureCosmosDB.Neo4jhasbeenleadingtheGraph
databasessectorforsomeyears,aswecanseeinthetrendscatterplot.Itmust
betakenintoaccountthatthescoreisdisplayedinlogarithmicscale,thereforethe
differenceinpopularityisreallysignificant.
ItcanalsobeseeninthetrendscatterplotthatMicrosoftAzureCosmosDBappearedin
thegraphdatabaselandscapein2014,andsincethenitsrisein
popularityhasbeenquitesteep.AnargumentforthatisthatMicrosoftAzureis
wellintegratedinthesoftwaremarketplace.
Successfactor:Ithasbeenstated,whencomparingtheNoSQLDBMS,thatgraph
databaseshadalimitationinsize.Therefore,itisacompetitiveadvantagetowork
onfacilitatethepartitioningofagraph.WhileOrientDBandInfiniteGraphstatethat
theyaccomplishedso,Neo4jseemstobetheDBMSthatmoresuccessfullyis
improvinggraphpartitioning.[8]
Figure11:GraphDBMSRanking
Figure12:TrendGraphDBMSpopularityscatterplot
Chapter10:Neo4j
NecessityofNeo4j
WhyNeo4j?ByusingagraphdatabaselikeNeo4jwhichfocusesondatarela-
tionships;
patternsandtrendscaneasilybeseenunliketorelationaldatabases.Duetotoday’s
growingbusinessdemandsandcompetitiveatmosphere,usingtherighttoolisvery
importantandwhenitcomestowidelyconnecteddataNeo4jisthebestbecauseitis
thousandsoftimesfasterthantraditionaldatabases.Neo4janalyzeandtraverseofall
datainrealtimeandgivestheresultsveryfast.Neo4jiswidelyusedbylotsofbig
companieslikeeBay,Walmart,Cisco,UBSandmanymore.
WhatisNeo4j?Neo4jisanopen-sourceNoSQLgraphdatabasewritteninJavaand
ScalaandAccordingtodb-engines.com,Neo4jiscurrentlyworld’slead-inggraph
database.Thishasmanyreason.FirstofallNeo4jprovidesACID transaction
compliance,clustersupport,runtimefailover,highavailabilityandhighspeedquery
ing throughtraversals.Itscalestobillionsofnodesandrelationship.Ithasgreatuser
interfaceanditiseasytolearnbecausetherearelotsoffreeonlineresourcesonthe
web.Alsoithasgreatcommunitythatcanhelpwithanyprob-lems.Ingeneralterms
Neo4jisdesignedforlinkingrelationshipsandithandlesthisrelationshipswithspee
d,
ease,andextremeflexibility.WithNeo4j,modelscaneasilybeconvertedtodatabase
schema.Ifthedataisdenselyconnectedorvariousconceptualmodeltry’sisneeded
forthedatathenNeo4jisthesolution..
Neo4jVersions
Graphdatabasesusesarelationshipfirstapproachtostoringandqueryingyourdata.
Theystoredatainamuchmorelogicalfashion,awaythatrepresentstherealworld
and prioritizes the representation,discoverability and maintainability ofdata
relationships.Butdataintegrityisimportantformaydeveloperswhocareaboutdata
relationshipsoACIDpropertywasbroughtbacktoatleastonenosqldatabasecalled
Neo4J.ThisallowsusNeo4Jasatransactionaldatastore.Storingyourmostcritical
businessdata.
Graphdatabasesgivesdevelopersamoreintuitivedatamodelfasterqueriesand
betteragilitytoadapttochangesinthebusiness.
Figure13:Neo4jAsaLeadingGraphDatabase
HowNeo4jisDifferentThanTraditionalDatabases?Graphdatabasesaremuch
differentthantraditionalrelationaldatabaseslikeSQL.Insteadofusingtablewith
rowsandcolumns,graphdatabasesuseagraphwithnodesandrelationships.
Bothofthesetypesofdatabaseshavetheirplace.Relationaldatabaseisgreatfor
tabulardatathatisnotreallycloselyrelated.Ifwehavealotofnested
relationshipsinrelationaldatabaseitcangetverycomplicatedwithjointablesand
joinqueriesandweneedallkindsofprimaryandforeignkeysanditcanbereal
hardtodealwithandevenworsethanthatisitcanbereallycostlyonthesystem
sographdatabasesarebuilttofixthatproblem andworkwithdatathatismuch
morecloselyrelatedandmoredynamic.
Thus,becauseofthereasonsstatedabovewechooseNeo4jasourdatabase.
Figure14:Ebay’scommentaboutNeo4j
Neo4jWorking
Neo4jstoresanddisplaysdataintheformofgraph.InNeo4j,dataisrepresentedbyno
desand relationshipsbetweenthosenodes.
Neo4jdatabases(aswithanygraphdatabase)arealotdifferenttorelationaldatabase
ssuchasMS
Access,SQLServer,MySQL,etc.Relationaldatabasesusetables,rows,andcolumn
stostoredata. Theyalsopresentdatainatabularfashion.
Neo4jdoesn'tusetables,rows,orcolumnstostoreorpresentdata.
Neo4jisbestforstoringdatathathasmanyinterconnectingrelationshipsthat'swhygr
aphdatabases
likeNeo4jhasanadvantageandmuchbetteratdealingwithrelationaldatathanrelatio
naldatabases are.
Thegraphmodeldoesn'tusuallyrequireapredefinedschema.Sothereisnoneedtocr
eatethe
databasestructurebeforeyouloadthedata(likeyoudoinarelationaldatabase).InNe
o4j,thedatais thestructure.Neo4jisa"schema-optional"DBMS.
InNeo4j,noneedtosetupprimarykey/foreignkeyconstraintstopredeterminewhichf
ieldscanhave
arelationship,andtowhichdata.Youjusthavetodefinetherelationshipsbetweenthe
nodesyou need.
FeaturesofNeo4jGraphDatabase
SQLLikesimplequerydialectNeo4jCQL
It’sbackinguptheIndexesbyusingApacheLucence
ItcontainsaUItoexecuteCQLCommandsi.e,Neo4jDataBrowse
It’sbackinguptheUNIQUEconstraint
ItbolstersfullACIDproperties
ItutilizesNativegraphstockpilingwithNativeGPE
ItfollowsPropertyGraphDataModel
ItgivesRESTAPItobeexecutedforanyProgrammingLanguagelikeSpring,Java,
Scalaandsoforth
ItbolsterstradingofinquiryinformationtoJSONandXLSformat
AdvantagesofNeo4j
PropertiesofNeo4j
Figure15:GeneralLookatNeo4j
FollowingarepropertiesofNeo4j;
Datamodel(flexibleschema):Neo4jhaspropertygraphmodel.Itcanbe
explainedlikegraphhasnodesandthesenodesareconnectedwitheach
other.Nodesandtheirrelationshipsstoredatainkey-valuepairsknownas
properties.Neo4jhasalsoflexibleschemaitmeanspropertiescanbe
addedorremovedwhenitisnecessary.
ACIDproperties:Neo4jsupportsfullACID(Atomicity,Consistency,Isolation,and
Durability)rules.
Scalabilityandreliability:Databasecanbescaledbyincreasingthenumberofreads/
writes,andthevolumewithouteffectingthequeryprocessing
speedanddataintegrity.Neo4jalsoprovidessupportforreplicationfor
datasafetyandreliability.
Thetraversalofthegraph:Thetraversalistheoperationofvisitingasetof
nodesinthegraphbymovingbetweennodesconnectedwithrelationships.
It’sauniqueoperationtothegraphmodelfordataretrieval.Queryingthedata
usingatraversalonlytakesintoaccountthedatathat’srequired,thereforeit
isnotneededtoquerytheentiredatasetinanexpensiveoperation,likeisthe
casewithjoinoperationsonrelationaldata.[1]
CypherQueryLanguage:Neo4jprovidesapowerfuldeclarativequerylanguagekno
wnasCypher.ItusesASCII-artfordepictinggraphs.Cypheris
easytolearnandcanbeusedtocreateandretrieverelationsbetween
datawithoutusingthecomplexquerieslikeJoins.[9]
Built-inwebapplication:Neo4jprovidesabuilt-inNeo4jBrowserweb
application.Usingthis,creatingandqueryinggraphdatacanbedone.
Drivers:Neo4jcanworkwith
RESTAPItoworkwithprogramminglanguagessuchasJava,Spring, Scalaetc.
JavaScripttoworkwithUIMVCframeworkssuchasNodeJS.
ItsupportstwokindsofJavaAPI:CypherAPIandNativeJavaAPIto
developJavaapplications.
Indexing:Neo4jsupportsIndexesbyusingApacheLucence.
AdvantagesofNeo4jGraphDatabase
Neo4jisverypopularinlotsofindustriesanditisafirstchoiceofmanycompanies.Ne
o4jgivesadvantageinmanypoints.Firstofallitisbasedonhandling
complexdataconnectionsasaresultoftheincreasedvolumeandstrengthinthe
data,thesecompaniesgainlotsofbenefitsamongtheircompetitive.Following
aretheadvantagesofNeo4j.
Easytorepresentconnecteddata:Itmakesbotheasyandfasttotraverseor
navigatelargeamountsofdatathathassomesortofrelationship
Canrepresentsemi-structureddataeasily:Datathatdoesnotfallintonatural
structurecanbeeasilyrepresentedinagraphdatabase
CypherCommands:Cyphercommandsarehumanreadableandveryeasy
tolearnSimpleandPowerfulDataModel:Thepropertygraphdatamodelis
simpleyetstillverypowerful.Thebasicbuildingblocksareknowntorelationshipsa
ndtheycancontaindataintheform ofkeyvaluepairsor
propertiesunliketherelationalmodel.
JoinAspect:There’snoneedforcomplexandcostlyjoinstoretrieveconnectedorrel
ateddata.Insteadthegraphdatabaseusesanaturalconcept
ofrelationships.Relationshipsinagraphactuallyformedpathssoquerying
ortraversingagraphinvolvesfollowingthosepatsandbecauseofthat pathori-
entednatureofthegraphdatamodel,themajorityofpathbased
operationsareextremelyefficient.
Neo4jisonlygraphdatabasethatcombinesnativegraphstorage,scalable
architecture optimized forspeed,and ACID compliance to ensure
predictabilityofrelationship-basedqueries.[10]
Real-timeinsights:Neo4jprovidesresultsbasedonreal-timedata.
Highavailability:Neo4jishighlyavailableforlargeenterprisereal-time
applicationswithtransactionalguarantees.[15]
Biggestgraphcommunityintheworld:Neo4jhasthelargestandmost
contributorgraphcommunity.
Easytolearn:MatureUIwithintuitiveinteractionandbuilt-inlearning.[10]
PerformanceInNeo4j
Neo4jprovidesfastandefficientgraphexperienceandthestrongestpartofitis;Neo4j
cantraversemillionsofnodesinmilliseconds.Alsoevenexponentiallyincreas-
ingdata sizedoesnoteffecttheperformanceofNeo4junlikerelationaldatabases.
VolkerPacher,eBaydeveloperandNeo4jclient:"OurNeo4jsolutionisliterallya
thousandtimesfasterthanthepreviousMySQLsolution,withsearchesthat
requirebetween10and100timeslesscode”.
Figure16:QuerytimesforOracleExadatavsNeo4j
Figure17:Tomtom’sComparisonofNeo4jwithMySQL
HowToIncreasePerformanceOfNeo4j?
Increasingthesizeofavailableheapmemory(Between8G-16G).
Increasingopenfilelimitfromdefault1024toatleast40000tobesure.
Inordertoavoidcostlydiskaccess,makingsureofrelevantgraphdatais
cachedinmemory.
Forthenon-Neo4jtasksrunningonthecomputerasufficientmemory
shouldbereserved.(Atleast16G)
Simplealgorithmsleadstoincreasedperformance.
Allrelatednodesandedgesshouldbekeptinservermemorybeforegiving results.
Traversalsshouldbeindependent.
Indexesshouldbeused.
WhatcanNeo4jbeusedfor?
ButthemainreasonNeo4jisbetterforrelationaldataisinthewayit
allowsyoutocreaterelationships.Neo4jisbuiltaroundrelationships.
Thereisnoneedtosetupprimarykey/foreignkeyconstraintsto
predeterminewhichfieldscanhavearelationship,andtowhichdata.
WithNeo4j,justaddanyrelationshipbetweenanynodewheneveryou need.
SothismakesNeo4jextremelywellsuitedforsocialnetworking
applicationslikeFacebook,Twitter,etc.Buttherearemanyother
areaswhereNeo4jexcels.Herearesomeofthemainareasthat Neo4jcanbeusedfor:
● Socialnetworks
●Realtimeproductrecommendations
●Networkdiagrams
●Frauddetection
●Accessmanagement
●Graphbasedsearchofdigitalassets
●Masterdatamanagement
CypherQueryLanguage
Cypherisadeclarativelanguageforworkingwithgraphsandgraphdataforboth
readingandwritingtothegraphanditisveryexpressiveandpowerful.Also
Cypherdefinespatternsinthegivengraphdata.
Cypherisdeclarativelanguage:Thismeansthatwespecifythedatathatweare
interestedin.Wedonotspecifyhowtogetthatdatafromthedatabase.
Cypherisveryhumanreadablelanguageanditisaccessiblenotjustfordevelopersev
eryonecaneasilylearnanduseit.
CypherhasexpressionssimilartoSQLlikeWHERE,ORDER BY andsimple
conditionstatementslike<,=,>.Itsdifferencewithsqlis;Cypherisdesignedto
representgraphdatapatternsforexampleithasMATCHpropertythispropertyis
builtonfindingandspecifyingpatternsinthedata
Structure
NodesNodesrepresentsdataentitiesandtheycanhavelabelsandeachnode
representsdifferentsingledataentities.Itisequivalenttorecordsinarela-tional
databaseNodescanalsohavepropertieswhicharebasicallyattributes.Nodesare
shownwithparentheseslike(p:Product).
Figure18:NodeRepresentation
RelationshipsInCypher;betweenthenodeswehavelineswhichrepresentthe
relationshipbetweeneachnode.Relationshipscanalsohavepropertiesjustlike
nodeswhichissomethingthatismuchdifferentthanSQL.Alsorelationships
havedirections.Relationshipisshownas–>betweentwonodes.
OperationsInCypher
Create:Itisusedtocreatenodesandrelationshipsbetweenthem
Wecreatedanoderepresentinguswithfiveproperties;
Name:’AjitSingh’
Country:’India’
City:’Patna’
CREATE(n:Person{name:’AjitSingh’,country:’India’,city:’Patna’,
DateOfBirth:’21.05.1984’,School:’PWC’})RE-TURNn
Name:’AnnaTuruPi’
Country:’Spain’
City:’Barcelona’
CREATE (n:Person{name:’AnnaTuruPi’,country:’Spain’,city:
’Barcelona’,DateOfBirth:’30.07.1995’,School:’PWC’})RETURNn
Wecreatedarelationshipcalled"FRIENDS_WITH"withtheproperty"SINCE";
WiththisCyphercode;
MATCH(a:Person),(b:Person)WHEREa.name=’AjitSingh’ANDb.name=
’AnnaTuruPi’CREATE(a)-[r:FRIENDS_WITH{SINCE:"17/09/2017"}]->(b)
RETURNr
(a)ResultinConsole (b)AfterCreatingRelationship
Figure19:CreateRelationshipBetweenTwoNodes
Match:Matchfindsspecifiedpatternsinthedata.
Figure20:Relationships
WiththisCyphercodeweshowedallpeoplewhomEstebanZimányiteachesto;
MATCH(a:Person)<-[:TEACHES_TO]-(b:Person{name:’Este-
banZimányi’}) RETURNa.name
Set:Thisisusedtoupdatepropertiesinthenodesandrelationships.
WiththisCypherCodewechangedEstebanZimányi’sdateofbirthto’01.01.1966’
MATCH(n{name:’EstebanZimányi’})SETn.DateOfBirth=’01.01.1966’
RETURNn
DeleteThisoperatordeletesnodesorrelationshipsinthedata.
WiththisCyphercodewedeletedAjitSingh
MATCH(n:Person{name:’AjitSingh’})DELETEn
LoadingDataWithCypher
TherearelotsofwaystoimportdatainNeo4jbutthemostcommonwayisuploadit
asacsvfile.LoadCSVoperatorisbuiltintoNeo4jandthisoperatorisusedforsmall
ormediumsizedatasetsupto10millionrecords.Ifwewanttouploaddatathathas
morethan10millionrecordsthanweshoulduse[USING PERIODICCOMMIT[n]]
property.Ifwedontusethispropertythismeansthatweareprocessingwholefilein
onerunandcreatingeverythinginonetransaction
LoadCSV:ThisoperatorisusedforimportingCSVfilesintoNeo4j.
Figure21:LoadCSVOperatorStructure
UseCasesofNeo4j
Figure22:UseCasesOfNeo4j
Thecommonusecasesare;
RealTimeRecommendations:Recommendationalgorithmsfindsrelationships
betweenpeople,productsandotherservicesrelatedtopurposebasedonuser’s
previousbehaviors.Neo4jisabletostoreinterconnecteddataaboutcustomers
andproductsandsinceNeo4jdoesn’tneedindexingateverysuggestionit
providesveryfastandeffectivealgorithm todealwithrealtimedata.Walmart
usesNeo4jforthispurpose
MasterDataManagement:Inlargeorganizations,differentsystemsstoresinformati
onaboutcustomers,employees,titlesandsupplychain.Withthegraph
modelitiseasyto bring datafrom differentsystemscreateviewsabout
customersorcankeeptrackofalltheinformationabouttheorganizational
systemitself.CiscousesNeo4jforthispurposeandthecompanyalsousesNeo4j
fortheirhelpdeskso-lution
Figure23:MasterDataManagementGraphDesign
FraudDetection:Frauddetectionisveryimportantinfinanceindustry.Nowa-days
inordernottobedetectedbybank’sfraudalgorithmspeopleusedifferent
approacheslikeopenseveralbankaccountswithvalidinformationanddonormal
transactionswithoutbeinganoutlier.Sopeopleopenfalsebankaccountswiththe
sameidentitytokenandwithdrawallthemoneyinallbankaccounts.Itishardto
detectthatbehaviorbutitisveryeasytoseethatwithgraphbecausethepattern
ofthepeopleopeningbankaccountsusingthesameidentitytokencanbeeasily
detectedasapatterninagraph
GraphBasedSearch:Metadataisavailableforthingslikeproducts,articlesetc.
Andbeingabletomodelmetadataasagraphallowstoenhancesearchmeaning
usersareabletofindmorerelevantthingsforthem.ForexampleLinkedIn;When
searchisexecutedwedon’tseerandomoralphabeticalsortedresultswefirstsee
therelevantones.LufthansausesNeo4jforthismatter.
Network&ITOperations:Ifdatacenterismodelledasagraphthendepen-dency
analysiscaneasilybeappliedonnetworksystemstogetconclusionslikeifone
virtualmachinegoesdownhowmanyapplicationswillbeaffected.HpusesNeo4j
tomodeltheirnetworkforsomelargetelecommunicationproviders.
Figure24:NetworkITOperationsGraphDesign
Identity&AccessManagement:Withinlargeorganizationstherearehundreds
ofusersandcontrollingwhocanaccesstowhichinformationiscrucialfor
securityreasons.Socreatinggroupsandrolesforeachusercomesinhandyin
thissituation.Thiskindofdataisveryrichandconnectedandcanbeeasily
handledbyNeo4j.UPCLondonusesNeo4jforthatanditreceived2014Graphic
awardsfor“Bestİdentityandaccessmanagementapp”
Chapcrer11:GettingstartedwithNeo4j
Requirements
SpringDataNeo4j5.1.xatminimum,requires:
JDKVersion8andabove.
Neo4jGraphDatabase3.1andabove.
SpringFramework{springVersion}andabove.
IfyouplanonalteringtheversionoftheNeo4j-OGMmakesureitisa3.0.0+release.
DownloadNeo4j
FirstdownloadNeo4jfromitsofficialwebsite:https://neo4j.com/download/
YoucanchoosefromeitherafreeEnterpriseTrial,orthefreeCommunityEdition.Her
e,weareusing theCommunityEdition.
Runthedownloadedfileandfollowtheinstructionsgivenbelow:
StartNeo4j:
StarttheServer
ClickontheinstalledNeo4jCommunityEdition.
Initializationstarted:
Neo4jisstarted.Itisreadytouse
Openbrowserandgotolocalhost:http://localhost:7474/browser/
Orhttp://127.0.0.1:7474/browser/
StartNeo4jwebserver
Visitthesub-directory/binoftheextractedfolderandexecuteinterminal./neo4jstart
Visithttp://localhost:7474/
Onlythefirsttime,youwillhavetosigninwiththedefaultaccountandchangethe
defaultpassword.Asofcommunityversion3.0.3,thedefaultusernameand
passwordareneo4jandneo4j.
YoucannowinsertNeo4jqueriesintheconsoleprovidedinyourwebbrowserand
visuallyinvestigatetheresultsofeachquery.
StartNeo4jwebserver
EachNeo4jservercurrently(inthecommunityedition)canhostasingleNeo4j
database,soinordertosetupanewdatabase:
Visitsub-directory/bin andexecute./neo4jstop tostoptheserver
Visitagainthesub-directory/binandexecute./neo4jstart
Thewebserverhasstartedagainwiththenewemptydatabase.Youcan
visitagainhttp://localhost:7474/toworkwiththenewdatabase.
Thecreateddatabaseislocatedinthesub-directory/data/databases ,underafolder
withthenamespecifiedintheparameterdbms.active_database .
Deleteoneofthedatabases
MakesuretheNeo4jserverisnotrunning;gotosub-directory/binandexecute
./neo4jstatus .Iftheoutputmessageshowsthattheserverisrunning,alsoexecute
./neo4jstop .
Thengotosub-directory/data/databasesanddeletethefolderofthedatabase
youwanttoremove.
CypherQueryLanguage
ThisistheCypher,Neo4j'squerylanguage.Inmanyways,CypherissimilartoSQL
ifyouarefamiliarwithit,exceptSQLreferstoitemsstoredinatablewhile
Cypherreferstoitemsstoredinagraph.
First,weshouldstartoutbylearninghow to createagraphandadd
relationships,sincethatisessentiallywhatNeo4jisallabout.
CREATE(ab:Object{age:30,destination:"England",weight:99})
YouuseCREATEtocreatedata
Toindicateanode,youuseparenthesis:()
Theab:Objectpartcanbebrokendownasfollows:avariable'ab'and
label'Object'forthenewnode.Notethatthevariablecanbeanything,but
youhavetobeconsistentinalineofCypherQuery
Toaddpropertiestothenode,usebrackets:{}brackets
Next,wewilllearnaboutfindingMATCHes
MATCH(abc:Object)WHEREabc.destination="England"RETURNabc;
MATCHspecifiesthatyouwanttosearchforacertainnode/relationship
pattern(abc:Object)referstoonenodePattern(withlabelObject)which
storethematchesinthevariableabc.Youcanthinkofthisentireline
asthefollowing
abc= findthematchesthatisanObjectWHEREthedestinationisEngland.
Inthiscase,WHEREaddsaconstraintwhichisthatthedestinationmustbe
England.YoumustincludeareturnattheendforallMATCHqueries(neo4jwill
notacceptjustaMatch...yourquerymustalwaysreturnsomevalue[thisalso
dependsonwhattypeofqueryyouarewriting...wewilltalkmoreaboutthislater
asweintroducetheothertypesofqueriesyoucanmake].
Thenextlinewillbeexplainedinthefuture,afterwegooversomemore
elementsoftheCypherQueryLanguage.Thisistogiveyouatasteofwhatwe
candowiththislanguage!Below,youwillfindanexamplewhichgetsthecastof
movieswhosetitlestartswith'T'
MATCH(actor:Person)-[:ACTED_IN]->(movie:Movie)
WHEREmovie.titleSTARTSWITH"T"
RETURNmovie.titleAStitle,collect(actor.name)AScast
ORDERBYtitleASCLIMIT10;
AcompletelistofcommandsandtheirsyntaxcanbefoundattheofficialNeo4j
CypherReferenceCardhere.
RDBMSVsGraphDatabase RDBMS GraphDatabase
Table Graph Rows Nodes
ColumnsandData Propertiesanditsvalues
Constraints Relationships Joins Traversal
Cypher
Introduction
CypheristhequerylanguageusedbyNeo4j.YouuseCyphertoperformtasks
andmatchesagainstaNeo4jGraph.
Cypheris"inspiredbySQL"andisdesignedtobyintuitiveinthewayyoudescribe
therelationships,i.e.typicallythedrawingofthepatternwilllooksimilartothe
Cypherrepresentationofthepattern.
Examples
Creation
Createanode
CREATE(neo:Company)//createnodewithlabel'Company'
CREATE(neo:Company{name:'Neo4j',hq:'SanMateo'})//createnodewithproperties
Createarelationship
CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)
QueryTemplates
Runningneo4jlocally,inthebrowserGUI(default:http://localhost:7474/browser/
), youcanrunthefollowingcommandtogetapaletteofqueries.
:playquerytemplate
Thishelpsyougetstartedcreatingandmergingnodesandrelationshipsbytyping
queries.
CreateanEdge CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)
Deleteallnodes
MATCH(n)
DETACHDELETEn
DETACH doesn'tworkinolderversions(lessthen2.3),forpreviousversionsuse
MATCH(n)
OPTIONALMATCH(n)-[r]-() DELETEn,r
Deleteallnodesofaspecificlabel
MATCH(n:Book)
DELETEn
Match(capturegroup)andlinkmatchednodes
Match(node_name:node_type{}),(node_name_two:node_type_two{})
CREATE(node_name)-[::edge_name{}]->(node_name_two)
UpdateaNode
MATCH(n)
WHEREn.some_attribute="someidentifier"
SETn.other_attribute="anewvalue"
DeleteAllOrphanNodes
Orphannodes/verticesarethoselackingallrelationships/edges.
MATCH(n)
WHERENOT(n)--()
DELETEn ReadCypheronline:neo4j/topic/3669/cypher
Python&Neo4j
Examples
Installneo4jrestclient
pipinstallneo4jrestclient
Connecttoneo4j
fromneo4jrestclient.clientimportGraphDatabase
db=GraphDatabase("http://localhost:7474",username="neo4j",password="mypass")
Createsomenodeswithlabels
user=db.labels.create("User")
u1=db.nodes.create(name="user1")
user.add(u1)
u2=db.nodes.create(name="user2")
user.add(u2)
Youcanassociatealabelwithmany nodesinonego
Language=db.labels.create("Language")
b1=db.nodes.create(name="C++")
b2=db.nodes.create(name="Python")
beer.add(b1,b2)
Createrelationships
u1.relationships.create("likes",b1)
u1.relationships.create("likes",b2) u2.relationships.create("likes",b1)
Bi-directionalrelationships
u1.relationships.create("friends",u2)
Matchusingneo4jrestclient
fromneo4jrestclientimportclient
q='MATCH(u:User)-[r:likes]->(m:language)WHEREu.name="Marco"RETURNu,type(r),m'
"db"asdefinedabove
results=db.query(q,returns=(client.Node,str,client.Node))
Printresults
forrinresults:
print("(%s)-[%s]->(%s)"%(r[0]["name"],r[1],r[2]["name"]))
Output:
(Marco)-[likes]->(C++) (Marco)-[likes]->(Python)
Chapter12:Neo4jApplication
SoftwareForthegraphdatabase,Neo4jCommunityEdition3.2.5hasbeenused,
andfortherelationaldatabase,SQLServer2017.
UseCaseSelected
Asproposedingraphdatabasebenchmarkguidelines[4],thebestteststo
benchmarkagraphdatabaseare:traversal(whichincludesthecalculationofthe
shortestpath),graphanalysis,connectedcomponents,communities,centrality
measures,patternmatchingandgraphanonymisation.Itisalsocommentedthat
amongthedomainswheregraphdatabasesprovetobemorebeneficialarethe
shortestpathgraphanalysisandrealtimeanalysisoftrafficnetworks.Inour
implementation,wearegoingtomodelflightroutes,astheyhavetheideal
propertiestobenchmarkagraphdatabase.Airportsandairlinesareelements
wheretheinformationliesonthetheirintercommunications.
Data
Becauseofthesizeconcernswecreatedsyntheticdatainadditiontoourexistingdata
tables.Beforecreatingnewdatawehad67663differentroutesandnowwehave1193
413
differentroutes.Therowswecreatedhavedummyvariables,theydonothaveany
connectionwiththeexistingdataexcepttheirtypes.Soourqueriesmostlyresultedin
initialdataresults.Thisdatacreationprocesswasappliedbecausethemoredatawe
have,themoreaccuratebench-
markingresultsweget.Alsounliketraditionaldatabases,
addingmoredatatoNeo4jdoesnoteffectitsperformance.
ImplementingData
Figure25:OpenFlights.org
Neo4j:TocreatetheNeo4jdatabasewedevelopedapythoncode.Thiscodeuses
py2neolibrarytoaccessNeo4jdatabaseanditreadsourdata(externalsource)to
createnodes,relationships,propertiesandindexes
Figure26:Structureofthepythoncode
Theoriginalairportdatahadlatitudeandlongitudeattributes.Inordertopresent
bettervisualizationwecreatedafunctionthatcalculatesthedistancebetweentwo
connectedairports.Routedatahassource_airportanddestination_airportSowe
createdaroutenodeandweassignedthedistancebetweensource_airportand
destination_airportasanameattributetoroutenode.Intheendfourtypesofnodes
areAirlines,AirportsandRoutes,andtheyhavethefollowingcommunications:
Route ! TO ! Airport
Route ! FROM ! Airport
Route ! OF ! Airline
Table2:Graphdatabaseschema WeimplementedourdatatoNeo4jwiththisschema;
Figure27:InitialSchema
Figure28:ExampleofaqueryinNeo4j
SQL:Arelationaldatabasewascreatedimportingeachflatfileasatableandthen
wecreatedforeignkeyreferencesbetweentables.
Exportdata
ToexporttheNeo4j,wechosetousetheapoclibrary.Itisneededtoauthorize
Neo4jtoruntheplugins.Forthat,thislineofcodehastobeaddedinneo4j.conf:
apoc.export.file.enabled=true.
ExporttoCSV
Exporttocypherscript
apoc.export.cypher.all(file,config):exportswholedatabaseincl.indexesasCyphe
r statements to the provided file
apoc.export.cypher.data(nodes,rels,file,config):
exportsgivennodesandrelationshipsincl.indexesasCypherstatementstothe
providedfileapoc.export.cypher.graph(graph,file,config)exportsgivengraphobj
ect incl.
Thedatabasewasalsoexportedtocypheracypherscript:
CALL apoc.export.cypher.all("/temp/neo4j_database_cypher_file.cypher",
{batchSize:10})YIELDfile,source,format,nodes,relationships,properties,time,
rows
Figure29:ExportingNeo4jdatabasetocypherscript
QueryExamples(Neo4j-SQL)
Figure30:Algorithmsforgraphdatabases
Addlibraries:IthasbeencommentedthatNeo4jincludesgraphalgorithmsthat
allow ustoperform queriesthatwouldbeimpossibletoperform inSQL.
LibrariesofalgorithmscanbedownloadedandaddedinNeo4jasplugins.
Figure31:Addjarfilesinpluginfolder
ItisneededtoauthorizeNeo4jtoruntheplugins.Forthat,thislineofcodehas
tobeaddedinneo4j.conf:dbms.security.procedures.unrestricted=apoc.*(e.g.,
apoclibrary).
Afterthat,Neo4jneedstoberestarted,anditcanbeverifiedthatthepluginis
workingbywritingthefollowingcommandinNeo4jbrowser:
CALLdbms.procedures()YIELDname,signature,description
WHEREnamestartswith"apoc"
RETURNname,signature,description
ShortestPath
Thisalgorithmistheonethatbetterjustifiestheexistenceofgraphdatabases.
ItscalculationisimpossiblewithSQL.InSQLitisneededtospecifythenumber
oflayerstheroutehas.
Firstqueryexample:findtheshortestpathtogofromanairportinMadridtoan
airportinSeoul.
MATCHp=shortestpath((src:Airportcity: ’Madrid’)-[r:FROM|TO*..15]
(dest:Airportcity: ’Seoul’))RETURNp
Figure32:ShortestpathqueryfromMadridtoSeoul
Figure33:Pipelineoftheshortestpathquery
Thenodescanbeexpanded,andweseetheairlinetowhicheachroutebelongs.
Figure34:Expandedshortestpathquery
Secondqueryexample:findtheshortestpathbetweenanairportinSeouland
anairportinAntwerp.
MATCHp=shortestpath((src:Airport{city: ’Seoul’})-[r:FROM|TO*..15]
(dest:Airport{city: ’Antwerp’}))RETURNp
Figure35:ShortestpathqueryfromSeoultoAntwerp
Payingattentiontotherelationships,itcanbeseenthatthequerydoesn’toutputa
physicallypossibletravelingroutefrom theorigincitytotheorigincity.Inthefirst
query,oneofthepathsendsupinSeoul,buttheotherhastwosources,Madridand
Seoul,andtheybothendupinBeijing.Thesecondqueryhasthreeoriginairports,
oneinAntwerpandtwoinSeoul,andalltheroutesfinishinGeneve.
Thepurposeofthealgorithmistofindtheshortestpathtoconnecttwonodes,
independentlyofthephysicalmeaning,butrealroutescanbecreatedwiththe
followingmodification:
Persistentinferredrelationships:Foreachroutegoingfrom
anairporttoanother,arelationshipconnectingbothairportshasbeenadded.Thisway
,the shortestpathquerycanlookforonlyonetypeofrelationship.Iftheobjectiveis
tofindphys-icallypossiblepathsbetweentwoairports(e.g.,notsteppinginto
anairline)itwillbeassuredlookingforthatinferredrelationshipthatairports
arebeingconnectedtoairports.
RelationshipCONNECTED.Thisrelationshiphasthepropertyweight,andispropo
rtionaltothenumberofroutesbetweentwoairports.Itisbeingusedinthe
shortestpathqueriesandcommunitydetectionqueries.
Cyphercodetocreatetherelationship:
MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,COUNT(*)ASweight
CREATE(ap1)-[c:CONNECTED]->(ap2)
SETc.weight=weightInthefigurebelowthedatabaseschemaafteraddingthe
inferredrelationshipisdisplayed:
Figure36:Neo4jDBschemaafteraddingConnectedrelationships
Cyphercodetodeletetherelationship:
MATCH(ap1:Airport)-[r:CONNECTED]->(ap2:Airport)DELETEr
RelationshipGOINGTO.Thisrelationshipsavestherouteandairlineinformation
initsproperties.Itisbeingusedintheshortestpathqueriesandcommunity detec-
tionqueries.
Cyphercodetocreatetherelationship:
MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,r
MATCH(r)-[:OF]->(al:Airline)
CREATE(ap1)-[g:GOINGTO]->(ap2)
SETg.distance=r.distance
SETg.route=id(r)
SETg.airline=al.name
Inthefigurebelowthedatabaseschemaafteraddingtheinferredrelationshipis
displayed:
Figure37:Neo4jDBschemaafteraddingGoingtorelationships
Cyphercodetodeletetherelationship:
MATCH(Airport)-[r:GOINGTO]->(Airport)DELETEr
Thefirstshortestpathqueryisrunagainnowwiththeinferredrelationships:
MATCHp=shortestpath((src:Airport{city: ’Madrid’})-[r:GOINGTO]
(dest:Airport{city: ’Seoul’}))RETURNp
Figure38:ShortestpathbetweenMadridandSeoul
Nowtheairportsaredirectlyconnectedtoeachother.Theroutenodecannotbe
seen,butitsidentifierissavedasoneoftherelationshipproperties.Withthe
followingqueryitcanbeverifiediftheroutematchestherequisites:
MATCH(r:Route)WHEREid(r)=50276RETURNr
ItisverifiedthattherelationshipGOINGTOwasequivalenttoarealoutbound
routebetweenMadridandSeoul.Thereturnroutisalsoverified:
MATCH(r:Route)WHEREid(r)=50205RETURNr
Figure39:Shortestpathreturnrouteoutput
ShortestpathinSQLServer:SQLServerhasthelimitationthatitneedtobe
specifiedthenumberoflayersinthepath.Analternativeistousearecursive
query,butfromourexperience,itwasnoteffective.
Whenexecutingthequery,weobtainthefollowingmessage:"Thestatementterminat
ed.Themaximumrecursion100exhaustedbeforestatementcompletion."
Figure40:PipelineofNeo4jqueryonAntwerp-Patnashortestpath
Betweennesscentrality:
Thebetweennesscentralityofanodeinanetworkisthenumberofshortestpaths
betweentwoothermembersinthenetworkonwhichagivennodeappears. Between-
nesscentalityisanimportantmetricbecauseitcanbeusedtoidentify
“brokersofinformation”inthenetworkornodesthatconnectdisparateclusters.[6]
Thisqueryshowstheairportsthathavetobecrossedmoreoftenbyroutesto gofrom
oneairporttoanother.Inotherworlds,theairportswheremore
transferstakeplace.Asitisdisplayed inthefigurebelow,theairports
highlightedarelikebottlenecksthatconnectclustersofairports.
MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL apoc.algo.betweenness([’CONNECTED’],airports,’OUTGOING’)
YIELDnode,score
SETnode.betweenness=score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT25
Figure41:Betweennesscentralityqueryresult
Thequeryoutputsfivebigairports,whicharecommonlyusedtotransferduring
intercontinentaljourneys.Itmakessensethattheyhavethehighestbetwenness
centrality.
Closenesscentrality:
Closenesscentralityistheinverseoftheaveragedistancetoallothercharactersin
thenetwork.Nodeswithhighclosenesscentalityareoftenhighlyconnectedwithin
clustersinthegraph,butnotnecessarilyhighlyconnectedoutsideofthecluster.[6]
Thisqueryoutputstheairportsthathavemoreconnectionstodifferentairports.In
otherwords,itshowsthelocationsthataremoregeographicallyisolatedtobe
reachedbyothermeansoftransport(e.g.islands).Itcanoutputtheairportswith
moredirectflightsfromdifferentlocationsortheairlinesthatperformmoreroutes.
Figure42:Conceptofclosenesscentrality
Queryexample:outputthefiveairportswithahigherclosenesscentrality:
MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL apoc.algo.closeness([’CONNECTED’], airports, ’OUTGOING’)
YIELDnode,score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT5
Figure43:Closenesscentralityqueryresult
Aspredicted,thequeryoutputsairportsthatareinhighlytouristicbutgeographicallyi
solatedlocations:LopezIslandnearSeattle,theriverAraguaiainthemiddle
ofBrazil,theGrandCanyonofColorado...
Figure44:Locationoftheairportswithhighestclosenesscentrality
Queryperformance:WritingPROFILEbeforethecypherquery,outputsthe
pipelineofthequeryexecution.
Figure45:Pipelineoftheclosenesscentralityquery
PageRank:
ThesecretofGoogle’ssuccesswasitssearchalgorithm,PageRank.PageRank
worksbycountingthenumberandqualityoflinkstoapagetodeterminearough
estimateofhowimportantthewebsiteis.Theunderlyingassumptionisthatmore
importantwebsitesarelikelytoreceivemorelinksfrom otherwebsites[11].This
algorithmcanoutputthemostconnectedairportorthemostpowerfulairline(the
nodeconnectedtomoreroutes).
Firstquery:outputthemostimportantairports
MATCH(ap:Airport)WITHcollect(ap)ASairports
CALLapoc.algo.pageRank(airports)YIELDnode,score
Figure46:Pipelineoftheairportspagerankquery
Themostimportantairportsarefrom London,Paris,Frankfurt,Patna,Dubai,
BeijingandtheUSA.Theoutputisnotsurprising.
Secondquery:Outputthemostpopularairlines.
MATCH(node:Airline)WITHcollect(node)ASairlines
CALLapoc.algo.pageRank(airlines)YIELDnode,score
Figure47:Pipelineoftheairlinespagerankquery
AsaresultwecanseethatRyanairistheleadingairline,followedbyfour
companiesfromtheUSAandthreefromChina.
CommunityDetection:
Therearemanyalgorithmsforcommunitydetection:trianglecounting,strongly
connectedcomponents,...Thisalgorithmsclustertogetherthenodesmore
relatedwitheachother.Wehavechosenanalgorithm from thelibraryAPOC,
andwhatthecodebelowdoes,isclassifytheairportnodesin40partitions.The
classificationisdeterminedontheweightoftheconnectedrelationships(the
numberofroutesbetweeneachpairofairports).
Seeingasairportsaregeographicallocation,androutsarephysicaljourneys
betweenthem,itisexpectedthatgeographicallyneighbouringairportswillbe
clusteredto-gether.Thathypothesisisverifiedbelow.
CALL apoc.algo.community(40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)
MATCH(ap:Airport)WHEREexists(ap.partition)RETURNap
Figure48:Communitydetectiongraph
Thefigureovertheselinesshowstheshapeofthegraphafterthenodeshave
beenclassifiedinpartitions.Toseewhichnodesbelongtoeachpartition,the
partitionnumbermustbereturnedasoutput:
CALLapoc.algo.community(40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)
MATCH(ap:Airport)WHEREexists(ap.partition)
RETURNap.partition,ap.country,COUNT(*)ASnum
ORDERBYap.partition,numDESC
Figure49:Communitydetectiontable
Goingbacktothevisualizationofthecommunitydetectionforairports,thepartitionsc
anberecognizedandverifiedbylookingatthetable.Theclusterofsix
nodesdisconnectedfrom therestofairportsiscomprisedofPapuaNewGuinea
airports(thecountrycanbeseenbyhoveringoverthenodes).Theybelongtothe
firstpartitioninthetable,6394.
Thefollowingpartofthegraphisabitscattered,butitcanbeseenthattheyareall
communicatedtothecentralnodes.Hoveringoverthem,weseethattheyallbelong
toCanada,andwecansupposethatthemoreseparatednodesareregionalairports
connectedtobiggermoreimportantairports.Thatpartofthegraphisequivalentto
sevenpartitionsinthetable.
NexttoCanada,agroupofnodesareseparated,andthoseairportsareallfrom
Algeria.Theymustbelongtopartition6624.
ThemorecentralizedpartofthissubgrapharetheairportsfromFinland.Someof
thoseareconnected withaGreenland’sairport,whichconnectswithother
GreenlandandIcelandairports.
ThenextsubgraphshowsairportsfromdifferentAfricancountriesinterconnected
witheachother.Ontheleftside,thereareairports,andairportsfrom african
countrieshighlyconnectedtothem,andontherightsidetherearemainlynigerian
airports,amongotherafricanaiportstoo.
Goingbacktothecenterofthegraph,itishardtorecognizemorethanonepartition,
asitshowsthecentraleuropeanairports,whicharehighlyinterconnected.
Atlast,apartitionwasdetectedinthetable,8355.Checkingifthoseairportsare
geographicallyrelated,ithasbeendeterminedthatthoseareislandsbetween
Polynesia,MicronesiaandMelanesia.that
(b)Geographicallocation
(a)Partitiontable
Figure50:Australasiapartition
PossiblequeriesonSQL
TheprevioussectionshowedoperationsthatcannotbedonewithSQL.Nowwe
willpresentoperationsapplicabletoboth;
Findingflightsbetweentwoairportsthathavenodirectroutebe-tweenthem:
MATCH
selectdistinctA1.Nameas
p=allShortestPaths((ap1:Airport [1stAirport]
{city:’Antwerp’})-[*]->(ap2:Airport ,airline1.nameas[1st
{city:’Patna’})) Airline],
WITHextract(nodein A2.Nameas[2ndAirport],
nodes(p)|node.name)as airline2.nameas[2nd
cities, Airline],
extract(relin A3.Name[3rdAirport],
relationships(p)|rel.airline)as airline3.name[3rdAirline],
airlines a4.name[4thAirport]
RETURNcities,airlines FROMroutesrINNERJOIN airportsa1
ONr.source_airport_id=a1.ID
ĢINNERJOINairlinesairline1
ONairline1.id=r.airline_id
INNERJOINairportsa2
ON
r.destination_airport_id=a2.ID
INNERJOINroutesr2
ona2.ID=r2.source_airport_id
INNERJOINairlinesairline2
onairline2.id=r2.airline_id
INNERJOINairportsa3
ON
r2.destination_airport_id=a3.ID
INNERJOINroutesr3
ona3.id=r3.source_airport_id
INNERJOINairlinesairline3
onairline3.id=r3.airline_id
INNERJOINairportsa4
on
a4.id=r3.destination_airport_id
WHEREa1.city=’Antwerp’and
a4.city=’Patna’
(a)Neo4jResult
(b)SQLResult Figure51:ComparisonofQueries-firstquery
Asitcanbeseenfromherefindingallpossibleroutesbetweentwoairportsiseasyin
Neo4j.BesidesthatNeo4jgivesvisualization.
Thereisoneimportantpointhere;InSQLwehavetospecifylevelofdepthtofind
results.Forexampleinthisquerywesearched3-levelflightsbetweenAntwerpand
Patna.Ifwesearched1or2levelthenthequerywouldhavereturnednoresult.Butin
Neo4jwedon’thavetospecifylevel,itfindsallroutesbetweentwoairportsandeven
calculatestheshortestroute.ThereforethisisoneofthedrawbacksofusingSQLin
datathathaslevels.
Nearestairporttocitybydistance
Select
match(airport1:Airport{city:’Bologna’} top1
)<-[:FROM]-(route:Route) A2.name,a2.city,a2.country
-[:TO]->(airport2:Airport) ,dbo.DistanceKM(a.latitude,a2. latitude,
RETURNairport1, A.longitude,A2.longitude)as
route,airport2 distance
ORDERBYroute.distance fromroutesr
asclimit1 INNERJOINairportsa
ona.id=r.source_airport_id
INNERJOINairportsa2
on
a2.id=r.destination_airport_id
WHEREA.city=’Bologna’
orderbydistanceasc
WhilewewereuploadingourdataintoNeo4jwecreatedanodecalledroute
andthisnodehasthreerelationships;TO,FROM,OFandasadescriptive
propertyweassignedcalculateddistancepropertyintoroutenode.Tobeinthe
samepagewecreatedafunctioninSQLthatcalculatesdistancesbetween
airportsgivenlat-itudeandlongitudeattributesofairportswhichalreadyexists
inourdata.BothapproachesgivethesameresultbutNeo4jalsoprovides
visualization.
Mostconnectedairports
MATCH SELECT
(airport:Airport)<-[:FROM]-(r:Route)
A.Name,A.City,A.Country,SUM(A.route_count)
WITHairport,count(r)as ASroute_count
departures FROM(
MATCH SELECT
(r2:Route)-[:TO]->(airport) a.Name,a.City,a.Country,
RETURNairport.nameas COUNT(*)asroute_countFROM
airport_name,departures routesR
,count(r2)asarrivals INNERJOINairportsAON
orderby A.ID=source_airport_id
departures+arrivalsdesc
GROUPBY
a.Name,a.City,a.Country
)
UNION(
SELECT
a.Name,a.City,a.Country,COUNT(*)
asroute_countFROM
routesR
INNERJOINairportsAON
A.ID=destination_airport_id
GROUPBY
a.Name,a.City,a.Country))A
GROUPBY A.Name,A.City,A.CountryORDER
(a)Neo4jQuery
(b)SQLquery
Figure52:Comparisonofqueries-thirdquery
Withthesequerieswefoundthemostinterconnectedairportbycountingnumberof
incomingandoutcomingflights.AsitseemsitisveryeasytowriteinNeo4j.
Bibliography
TareqAbedrabboDominicFoxJonasPartnerAleksaVukotic,NickiWatt.Neo4jin
Action.ManningPublications,2015.
StephanC.Carlson.Graphtheory.encyclopediabritannica.Availableathttps://ww
w. britannica.com/topic/graph-theory,May2013.Accessed:2017-11-30.
DB-
Engines.Knowledgebaseofrelationalandnosqldatabasemanagementsystems.
Availableathttps://db-engines.com/en/,2017.Accessed:2017-10-20. Martinez-
BazanN.Muntes-MuleroV.BaletaP.Larriba-PayJ.L.Dominguez-Sal,D.A
discussiononthedesignofgraphdatabasebenchmarks.September2010.
StefanEdlich.Nosqlarchive.Availableathttp://nosql-database.org/.
Accessed:2017-11-20.
Mathigon.Graphsandnetworks.Accessed:2017-11-30.
ThomasVialMichelDomenjoud.Graphdatabases:anoverview.OctoTalks,
July2012.Accessed:2017-11-30.
Neo4j.Introtocypher.
Neo4j.Toptenreasonsforchoosingneo4j.Availableathttps://neo4j.com/top-
tenreasons/.
Neo4j.Neo4jgraphalgorithms.Github,October2017.Accessed:2017-12-8.
University ofColorado.Database managementessentials.
Available at https://www.youtube.com/playlist?list=
PL73oFZbnYuixa9w-dL-EsM7Vy5BQGBIeO.Accessed:2017-10-21.
OpenFlights.org.Airport,airlineandroutedata.Availableathttps://
openflights.org/data.html.Accessed:2017-11-3.
TutorialsPoint.Graphtheory:Introduction.Availableathttps://www.tutorialspoin
t. com/graph_theory/graph_theory_introduction.htm.Accessed:2017-11-30.
JamesSerra.Relationaldatabasesvsnon-relationaldatabases.BigDataandData
Warehousing.JamesSerra’sBlog,August2015.Accessed:2017-11-29.
JamesSerra.Typesofnosqldatabases.BigDataandDataWarehousing.James
Serra’sBlog,April2015.Accessed:2017-11-29.
RoopendraVishwakarma.Thedifferenttypesofnosqldatabases.OpenSourceForU
, May2017.Accessed:2017-11-29.
S.Abiteboul.QueryingSemiStructuredData.InProc.ofthe6thInt.Conf.onDatabase
Theory(ICDT),volume1186ofLNCS,pages1–18.Springer,Jan1997.
S.AbiteboulandR.Hull.IFO:AFormalSemanticDatabaseModel.InProc.ofthe3th
Symposium onPrinciplesofDatabaseSystems(PODS),pages119–132.ACM
Press, 1984.
S.Abiteboul,D.Quass,J.McHugh,J.Widom,andJ.L.Wiener.TheLorelquerylangu
age
forsemistructureddata.InternationalJournalonDigitalLibraries(JODL),1(1):68–
88, 1997.
S.AbiteboulandV.Vianu.QueriesandComputationontheWeb.InProc.ofthe6thInt.
Conf.onDatabaseTheory(ICDT),volume1186ofLNCS,pages262–
275.Springer,Jan 1997.
R.AgrawalandH.V.Jagadish.EfficientSearchinVeryLargeDatabases.InProc.ofth
e14th Int.Conf.onVeryLargeDataBases(VLDB),pages407–
418.MorganKaufmann,AugSept 1988.
R.AgrawalandH.V.Jagadish.MaterializationandIncrementalUpdateofPathInfor
mation. InProc.ofthe5thInt.Conf.onDataEngineering(ICDE),pages374–
383.IEEEComputer Society,Feb1989.
R.AgrawalandH.V.Jagadish.AlgorithmsforSearchingMassiveGraphs.IEEE
TransactionsonKnowledgeandDataEngineering(TKDE),6(2):225–238,1994.
R.AlbertandA.L.Barabasi.Statisticalmechanicsofcomplexnetworks.Reviewsof
ModernPhysics,74:47,Jan2002.
N.Alechina,S.Demri,andM.deRijke.AModalPerspectiveonPathConstraints.Jou
rnalof LogicandComputation,13(6):939–956,2003.
B.AmannandM.Scholl.Gram:AGraphDataModelandQueryLanguage.InEuropea
n ConferenceonHypertextTechnology(ECHT),pages201–
211.ACM,NovDec1992.
M.AndriesandG.Engels.AHybridQueryLanguageforanExtendedEntityRelations
hip
Model.TechnicalReportTR9315,InstituteofAdvancedComputerScience,Univer
siteit Leiden,May1993.
M.Andries,M.Gemis,J.Paredaens,I.Thyssens,andJ.V.denBussche.Conceptsfor
GraphOrientedObjectManipulation.InProc.ofthe3rdInt.Conf.onExtendingDatab
ase Technology(EDBT),volume580ofLNCS,pages21–
38.Springer,March1992.
R.AnglesandC.Gutierrez.QueryingRDFDatafromaGraphDatabasePerspective.I
nProc.
2ndEuropeanSemanticWebConference(ESWC),number3532inLNCS,pages346
–360,
2005.
M.AzmoodehandH.Du.GQL,AGraphicalQueryLanguageforSemanticDatabases
.In Proc.ofthe4th Int.Conf.on Scientificand StatisticalDatabaseManagement
(SSDBM),volume339ofLNCS,pages259–277.Springer,June1988.
C.Beeri.DataModelsandLanguagesforDatabases.InProc.ofthe2ndInt.Conf.on
DatabaseTheory(ICDT),volume326ofLNCS,pages19–
40.Springer,AugSept1988.
G.Benk¨o,C.Flamm,andP.F.Stadler.AGraphBasedToyModelofChemistry.Journ
al ofChemicalInformationandComputerSciences(JCISD),43(1):1085–
1093,Jan2003.
C.Berge.GraphsandHypergraphs.NorthHolland,Amsterdam,1973.
U.Brandes.NetworkAnalysis.Number3418inLNCS.SpringerVerlag,2005.
T.Bray,J.Paoli,andC.M.SperbergMcQueen.ExtensibleMarkupLanguage(XML)
1.0, W3C Recommendation 10 February 1998.
http://www.w3.org/TR/1998/RECxml19980210.
A.Broder,R.Kumar,F.Maghoul,P.Raghavan,S.Rajagopalan,R.Stata,A.Tomkins,
and J.Wiener.GraphstructureintheWeb.In
Proc.ofthe9thInt.WorldWideWebconferenceonComputernetworks:theinternatio
naljournalof computerandtelecommunicationsnetworking,pages309–
320.NorthHollandPublishingCo.,2000.
P.Buneman.SemistructuredData.InProc.ofthe16thSymposium onPrinciplesof
DatabaseSystems(PODS),pages117–121.ACMPress,May1997.