0% found this document useful (0 votes)
94 views

Graph Data Model

Marko A. Rodriguez discusses how graphs can be used to model real-world scenarios and how graph traversals can be used to solve problems on graph data. He provides examples of modeling a software collaboration environment, discussion threads, and concept relationships as graphs. Traversals can identify circular dependencies in software, rank discussion contributors by how much their messages spur further discussion, and find related concepts to a topic by spreading activation across concept relationships. These domain-specific graphs and traversals can be integrated into a single graph to solve multi-domain problems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Graph Data Model

Marko A. Rodriguez discusses how graphs can be used to model real-world scenarios and how graph traversals can be used to solve problems on graph data. He provides examples of modeling a software collaboration environment, discussion threads, and concept relationships as graphs. Traversals can identify circular dependencies in software, rank discussion contributors by how much their messages spur further discussion, and find related concepts to a topic by spreading activation across concept relationships. These domain-specific graphs and traversals can be integrated into a single graph to solve multi-domain problems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

MarkoA.

Rodriguez
SupportingtheGraphLandscape

OnGraphComputing
January9,2013

i
34Votes
Theconceptofagraphhasbeenaroundsincethedawnofmechanicalcomputing
andformanydecadespriorinthedomainofpuremathematics.Dueinlargepartto
thisgoldenageofdatabases,graphsarebecomingincreasinglypopularinsoftware
engineering.Graphdatabasesprovideawaytopersistandprocessgraphdata.
However,thegraphdatabaseisnottheonlywayinwhichgraphscanbestoredand
analyzed.Graphcomputinghasahistorypriortotheuseofgraphdatabasesand
hasafuturethatisnotnecessarilyentangledwithtypicaldatabaseconcerns.There
arenumerousgraphtechnologiesthateachhavetheirrespectivebenetsanddrawbacks.Leveragingthe
righttechnologyattherighttimeisrequiredforeectivegraphcomputing.

Structure:ModelingRealWorldScenarioswithGraphs

Agraph(h p://en.wikipedia.org/wiki/Graph_(abstract_data_type))(ornetwork
(h p://en.wikipedia.org/wiki/Complex_network))isadatastructure.Itiscomposedofvertices(dots)
andedges(lines).Manyrealworldscenarioscanbemodeledasagraph.Thisisnotnecessarilyinherent
tosomeobjectivenatureofreality,butprimarilypredicatedonthethefactthathumanssubjectively
interprettheworldintermsofobjects(vertices)andtheirrespectiverelationshipstooneanother(edges)
(anargumentagainst(h p://www.amazon.com/DierentUniverseReinventingPhysics
Bo om/dp/0465038298)thisidea).Thepopulardatamodelusedingraphcomputingistheproperty
graph(h p://github.com/tinkerpop/blueprints/wiki/PropertyGraphModel).Thefollowingexamples
demonstrategraphmodelingviathreedierentscenarios.
ASoftwareGraph

ASoftwareGraph
Stephen(h ps://github.com/spmalle e)isamemberofagraphorientedengineeringgroupcalled
TinkerPop(h p://tinkerpop.com).StephencontributestoRexster(h p://rexster.tinkerpop.com).
Rexsterisrelatedtootherprojectsviasoftwaredependencies.WhenauserndsabuginRexster,they
issueaticket(h ps://github.com/tinkerpop/rexster/issues).Thisdescriptionofacollaborativecoding
environmentcanbeconvenientlycapturedbyagraph.Thevertices(orthings)arepeople,organizations,
projects,andtickets.Theedges(orrelationships)are,forexample,memberships,dependencies,and
issues.Agraphcanbevisualizedusingdotsandlinesandthescenariodescribedaboveisdiagrammed
below.

ADiscussionGraph
Ma hias(h ps://github.com/mbroecheler)isinterestedingraphs.HeistheCTOofAurelius
(h p://thinkaurelius.com)andtheprojectleadforthegraphdatabaseTitan
(h p://thinkaurelius.github.com/titan/).Aureliushasamailinglist
(h ps://groups.google.com/forum/#!forum/aureliusgraphs).Onthismailinglist,peoplediscussgraph
theoryandtechnology.Ma hiascontributestoadiscussion.Hiscontributionsbegetmorecontributions.
Inarecursivemanner,themailinglistmanifestsitselfasatree
(h p://en.wikipedia.org/wiki/Tree_(graph_theory)).Moreover,theunstructuredtextofthemessages
makereferencetosharedconcepts.

AConceptGraph
Agraphcanbeusedtodenotetherelationshipsbetweenarbitraryconcepts,eventheconceptsrelatedto
graph.Forexample,notehowconcepts(initalics)arerelatedinthesentencestofollow.Agraphcanbe
representedasanadjacencylist(h p://en.wikipedia.org/wiki/Adjacency_list).Thegeneralwayin
whichgraphsareprocessedareviagraphtraversals(h p://en.wikipedia.org/wiki/Graph_traversal).
Therearetwogeneraltypesofgraphtraversals:depthrst(h p://en.wikipedia.org/wiki/Depth
rst_search)andbreadthrst(h p://en.wikipedia.org/wiki/Breadthrst_search).Graphscanbe
persistedinasoftwaresystemknownasagraphdatabase
(h p://en.wikipedia.org/wiki/Graph_database).Graphdatabasesorganizeinformationinamanner
dierentfromtherelationaldatabases(h p://en.wikipedia.org/wiki/Relational_databases)ofcommon
softwareknowledge.Inthediagrambelow,theconceptsrelatedtographarelinkedtooneanother
demonstratingthatconceptrelationshipsformagraph.

AMultiDomainGraph

Thethreepreviousscenarios(software,discussion,andconcept)arerepresentationsofrealworld

Thethreepreviousscenarios(software,discussion,andconcept)arerepresentationsofrealworld
systems(e.g.GitHub(h p://github.com/),GoogleGroups(h p://groups.google.com),andWikipedia
(h p://wikipedia.org/)).Theseseeminglydisparatemodelscanbeseamlesslyintegratedintoasingle
atomicgraphstructurebymeansofsharedvertices.Forinstance,intheassociateddiagram,Gremlin
(h p://gremlin.tinkerpop.com)isaTitandependency,TitanisdevelopedbyMa hias,andMa hias
writesmessagesonAureliusmailinglist(softwaremergeswithdiscussion).Next,Blueprints
(h p://blueprints.tinkerpop.com)isaTitandependencyandTitanistaggedgraph(softwaremerges
withconcept).Thedo edlinesidentifyothersuchcrossdomainlinkagesthatdemonstratehowa
universalmodeliscreatedwhenverticesaresharedacrossdomains.Theintegrated,universalmodelcan
besubjectedtoprocessesthatprovidericher(perhaps,moreintelligent)servicesthanwhatany
individualmodelcouldprovidealone.

Process:SolvingRealWorldProblemswithTraversals
Whathasbeenpresentedthusfarisasinglegraphmodelofasetof
interrelateddomains.Amodelisonlyusefulifthereareprocesses
thatcanleverageittosolveproblems.Muchlikedataneeds
algorithms,agraphneedsatraversal
(h p://en.wikipedia.org/wiki/Graph_traversal).Atraversalisan
algorithmic/directedwalkoverthegraphsuchthatpathsare
determined(calledderivations)orinformationisgleaned(called
statistics).Eventhehumanvisualsystemviewingagraph
visualization(h p://en.wikipedia.org/wiki/Graph_drawing)isa
traversalengineleveragingsaccadic
(h p://en.wikipedia.org/wiki/Saccade)movementstoidentifypa erns.However,asgraphsgrowlarge

andproblemsdemandpreciselogic,visualizationsandthehumansinternalcalculatorbreakdown.A

andproblemsdemandpreciselogic,visualizationsandthehumansinternalcalculatorbreakdown.A
collectionoftraversalexamplesarepresentednextthatsolvetypicalproblemsinthepreviously
discusseddomains.
DeterminingCircularDependencies
Withthegrowthofopensourcesoftwareandtheeasebywhichmodulescanbeincorporatedinto
projects,circulardependencies(h p://en.wikipedia.org/wiki/Circular_dependency)aboundandcan
leadtoproblemsinsoftwareengineering.AcirculardependencyoccurswhenprojectAdependson
projectBand,throughsomedependencypath,projectBdependsonprojectA.Whendependenciesare
representedgraphically,atraversalcaneasilyidentifysuchcircularities(e.g.inthediagrambelow,A
>B>D>G>Aisacycle(h p://en.wikipedia.org/wiki/Cycle_(graph_theory))).

RankingDiscussionContributors
Mailinglistsarecomposedofindividualswithvaryinglevelsofparticipationandcompetence.Whena
mailinglistisfocusedonlearningthroughdiscussion,simplywritingamessageisnotnecessarilyasign
ofpositivecontribution.Ifanauthorsmessagesspawnreplies,thenitcanbeinterpretedthattheauthor
iscontributingdiscussionworthymaterial.However,ifanauthorsmessagesendtheconversation,then
theymaybecontributingnonsequitursorinformationthatisnotallowingthediscussiontoourish.In
theassociateddiagram,thebeigeverticesareauthorsandtheirrespectivenumberisauniqueauthorid.

Onewaytorankcontributorsonamailinglististocountthenumberof
messagestheyhaveposted(theauthorsoutdegree
(h p://en.wikipedia.org/wiki/Centrality#Degree_centrality)tomessages
inthemailinglist).However,iftherankingmustaccountforfruitful
contributions,thenauthorscanberankedbythedepthofthediscussion
theirmessagesspawn(thetreedepthoftheauthorsmessages).Finally,
notethatothertechniquessuchassentiment
(h p://en.wikipedia.org/wiki/Sentiment_analysis)andconcept
(h p://en.wikipedia.org/wiki/Formal_concept_analysis)analysiscanbeincludedinorderto
understandtheitentionandmeaningofamessage.
FindingRelatedConcepts
Stephensunderstandingofgraphswasdevelopedwhileworkingon
TinkerPopsgraphtechnologystack.Nowadaysheisinterestedin
learningmoreaboutthetheoreticalaspectsofgraphs.Viahisweb
browser,hevisitsthegraph
(h p://en.wikipedia.org/wiki/Graph_(mathematics))Wikipediapage.In
amanualfashion,Stephenclickslinksandreadsarticlesdepthrst,
graphtraversals,adjacencylists,etc.Herealizesthatpagesreferenceeach
otherandthatsomeconceptsaremorerelatedtoothersdueto
Wikipediaslinkstructure.Themanualprocessofwalkinglinkscanbe
automatedusingagraphtraversal.Insteadofclicking,atraversalcan
startatthegraphvertex,emanateoutwards,andreportwhichconcepts
havebeentouchedthemost.Theconceptthathasseenthemostow
(h p://en.wikipedia.org/wiki/Spreading_activation),isaconceptthathasmanyties(i.e.paths)tograph
(seepriorsalgorithms(h p://dl.acm.org/citation.cfm?id=956782)).Withsuchatraversal,Stephencanbe
providedarankedlistofgraphrelatedconcepts.Thistraversalisanalogoustoawavediusingovera
bodyofwateralbeitrealworldgraphtopologiesarerarelyassimpleasatwodimensionalplane(see
la ice(h p://en.wikipedia.org/wiki/La ice_graph)).
AMultiDomainTraversal
Thedierentgraphmodelsdiscussedpreviously(i.e.software,discussion,andconcept)wereintegrated
intoasingleworldmodelviasharedvertices.Analogously,theaforementionedgraphtraversalscanbe
composedtoyieldasolutiontoacrossdomainproblem.Forexample:
Recommendmeprojectstoparticipateinthatmaintainaproperdependencystructure,haveengaging
contributorspromotingthespace,andareconceptuallyrelatedtotechnologiesIveworkedonpreviously.
Thistypeofproblemsolvingispossiblewhenaheterogenousnetworkofthingsislinkedtogetherand
eectivelymovedwithin.Themeansoflinkingandmovingisthegraphandthetraversal,respectively.
Toconcludethissection,otherusefultraversalexamplesareprovided.
Computeastabilityrankforaprojectbasedonthenumberofissuesithasandthenumberofissuesits
dependencieshave,soforthandsooninarecursivemanner.
Clusterprojectsaccordingtoshared(orsimilar)conceptsbetweenthem.
RecommendateamofdevelopersforanupcomingprojectthatwilluseXdependenciesandisrelatedtoY

RecommendateamofdevelopersforanupcomingprojectthatwilluseXdependenciesandisrelatedtoY
concepts.
Rankissuesbythenumberofprojectsthateachissuessubmierhascontributedto.

GraphComputingTechnologies
Thepracticeofcomputingisaboutridingthenelinebetweentwoentangledquantities:spaceandtime.
Intheworldofgraphcomputing,thesametradeosexist.Thissectionwilldiscussvariousgraph
technologiesinordertoidentifywhatisgainedandsacricedwitheachchoice.Moreover,afew
exampletechnologiesarepresented.Notethatmanymoretechnologiesexistandthementioned
examplesarebynomeansexhaustive.
InMemoryGraphToolkits
Inmemorygraphtoolkitsaresingleusersystemsthatareorientedtowards
graphanalysisandvisualization.Theyusuallyprovideimplementationsofthe
numerousgraphalgorithmsdenedinthegraphtheory
(h p://en.wikipedia.org/wiki/Graph_theory)andnetworkscience
(h p://en.wikipedia.org/wiki/Network_science)literature(seeWikipediaslist
ofgraphalgorithms(h p://en.wikipedia.org/wiki/Category:Graph_algorithms)).Thelimitingfactorof
thesetoolsisthattheycanonlyoperateongraphsthatcanbestoredinlocal,mainmemory.Whilethis
canbelarge(millionsofedges),itisnotalwayssucient.Ifthesourcegraphdatasetistoolargetot
intomainmemory,thensubsetsaretypicallyisolatedandprocessedusingsuchinmemorygraph
toolkits.
Examples:JUNG(h p://jung.sourceforge.net/),NetworkX(h p://networkx.lanl.gov/),iGraph
(h p://igraph.sourceforge.net/),Fulgora(comingsoon)
[+]Richgraphalgorithmlibraries
[+]Richgraphvisualizationlibraries
[+]Dierentmemoryrepresentationsfordierentspace/timetradeos
[]Constrainedtographsthatcantintomainmemory
[]Interactionisnormallyverycodeheavy
RealTimeGraphDatabases

Graphdatabasesareperhapsthemostpopularincarnationofagraphcomputingtechnology.They
providetransactionalsemanticssuchasACID(typicaloflocaldatabases)andeventualconsistency
(typicalofdistributeddatabases).Unlikeinmemorygraphtoolkits,graphdatabasesmakeuseofthe
disktopersistthegraph.Onreasonablemachines,localgraphdatabasescansupportacouplebillion
edgeswhiledistributedsystemscanhandlehundredsofbillionsofedges.Atthisscaleandwithmulti
userconcurrency,whererandomaccesstodiskandmemoryareatplay,globalgraphalgorithmsarenot
feasible.Whatisfeasibleislocalgraphalgorithms/traversals.Insteadoftraversingtheentiregraph,
somesetofverticesserveasthesource(orroot)ofthetraversal.
Examples:Neo4j(h p://neo4j.org/),OrientDB(h p://www.orientdb.org/),InniteGraph
(h p://objectivity.com/INFINITEGRAPH),DEX(h p://www.sparsitytechnologies.com/dex.php),
Titan(h p://thinkaurelius.github.com/titan/)
[+]Optimizedforlocalneighborhoodanalyses(egocentrictraversals)
[+]Optimizedforhandlingnumerousconcurrentusers
[+]Interactionsareviagraphorientedquery/traversallanguages
[]Globalgraphanalyticsareinecientduetorandomdiskinteractions
[]Largecomputationaloverheadduetodatabasefunctionality(e.g.transactionalsemantics)
BatchProcessingGraphFrameworks

Batchprocessinggraphframeworksmakeuseofacomputecluster.Mostofthepopularframeworksin
thisspaceleverageHadoop(h p://hadoop.apache.org)forstorage(HDFS)andprocessing
(MapReduce).Thesesystemsareorientedtowardsglobalanalytics.Thatis,computationsthattouchthe
entiregraphdatasetand,inmanyinstances,touchtheentiregraphmanytimesover(iterative
algorithms).Suchanalysesdonotruninrealtime.However,becausetheyperformglobalscansofthe

algorithms).Suchanalysesdonotruninrealtime.However,becausetheyperformglobalscansofthe
data,theycanleveragesequentialreadsfromdisk(seeThePathologyofBigData
(h p://queue.acm.org/detail.cfm?id=1563874)).Finally,liketheinmemorysystems,theyareoriented
towardsthedatascientistor,inaproductionse ing,forfeedingresultsbackintoarealtimegraph
database.
Examples:Hama(h p://hama.apache.org/),Giraph(h p://incubator.apache.org/giraph/),GraphLab
(h p://graphlab.org/),Faunus(h p://thinkaurelius.github.com/faunus/)
[+]Optimizedforglobalgraphanalytics
[+]Processgraphsrepresentedacrossamachinecluster
[+]Leveragessequentialaccesstodiskforfastreadtimes
[]Doesnotsupportmultipleconcurrentusers
[]Arenotrealtimegraphcomputingsystems
Thissectionpresenteddierentgraphcomputingsolutions.Itisimportanttonotethattherealsoexists
hardwaresolutionslikeConveysMXSeries(h p://www.conveycomputer.com/products/mxseries/)
andCraysYARC(h p://www.yarcdata.com/)graphengines.Eachofthetechnologiesdiscussedall
shareoneimportantthemetheyarefocusedonprocessinggraphdata.Thetradeosofeachcategory
aredeterminedbythelimitssetforthbymodernhardware/softwareand,ultimately,theoretical
computerscience.

Conclusion
Totheadept,graphcomputingisnotonlyasetoftechnologies,butawayofthinkingabouttheworldin
termsofgraphsandtheprocessesthereinintermsoftraversals.Asdataisbecomingmoreaccessible,it
iseasiertobuildrichermodelsoftheenvironment.Whatisbecomingmoredicultisstoringthatdata
inaformthatcanbeconvenientlyandecientlyprocessedbydierentcomputingsystems.Thereare
manysituationsinwhichgraphsareanaturalfoundationformodeling.Whenamodelisagraph,then
thenumerousgraphcomputingtechnologiescanbeappliedtoit.

Acknowledgement
MikeLoukides(h p://radar.oreilly.com/mikel)ofOReillywaskindenoughtoreviewmultiple
versionsofthisarticleandindoingso,madethearticleallthebe er.
FromBlog,GraphDatabases,GraphTheory
10Comments

Trackbacks&Pingbacks
1.TheGraphCometh|AI3:::AdaptiveInformation
2.OnGraphComputing[SharedVertices/Merging]AnotherWordForIt
3.OnGraphComputingBigDataAnalytics
4.Links&readsfor2013Week3|MartinsWeeklyCurations
5.LoopyLa icesRedux|Aurelius
6.HereBeMonstersupdatesandhello,Neo4j|theburningmonk.com
7.Gamesys:HereBeMonstersUpdatesandHello,Neo4jNeoTechnology
8.MachineelfinApachesIncubatorLoveofText
9.GraphComputinglinks|graphcomputingfortheArts
10.UseofBigDataInArtProduction|graphcomputingfortheArts
Commentsareclosed.

BlogatWordPress.com.|

You might also like