Practical Graph Structures in SQL Server and Azure SQL: Enabling Deeper Insights Using Highly Connected Data 1st Edition Louis Davidson - Own the ebook now with all fully detailed chapters
Practical Graph Structures in SQL Server and Azure SQL: Enabling Deeper Insights Using Highly Connected Data 1st Edition Louis Davidson - Own the ebook now with all fully detailed chapters
com
https://ebookmeta.com/product/practical-graph-structures-in-
sql-server-and-azure-sql-enabling-deeper-insights-using-
highly-connected-data-1st-edition-louis-davidson-2/
OR CLICK HERE
DOWLOAD EBOOK
https://ebookmeta.com/product/flat-earth-faq-1st-edition-eric-dubay/
ebookmeta.com
Let Us Create First Edition Bishop Kendrick Nkole Kunda
https://ebookmeta.com/product/let-us-create-first-edition-bishop-
kendrick-nkole-kunda/
ebookmeta.com
https://ebookmeta.com/product/plant-based-comfort-food-classics-
simple-and-nourishing-vegan-dishes-sandra-vungi/
ebookmeta.com
https://ebookmeta.com/product/personal-sustainability-exploring-the-
far-side-of-sustainable-development-1st-edition-oliver-parodi-editor/
ebookmeta.com
https://ebookmeta.com/product/a-guide-to-ux-design-and-development-
developers-journey-through-the-ux-process-1st-edition-tom-green/
ebookmeta.com
https://ebookmeta.com/product/modern-physics-kenneth-s-krane/
ebookmeta.com
Using Understanding by Design in the Culturally and
Linguistically Diverse Classroom 1st Edition Amy J.
Heineke
https://ebookmeta.com/product/using-understanding-by-design-in-the-
culturally-and-linguistically-diverse-classroom-1st-edition-amy-j-
heineke/
ebookmeta.com
Louis Davidson
This work is subject to copyright. All rights are solely and exclusively
licensed by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in
any other physical way, and transmission or information storage and
retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The publisher, the authors, and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
1. Introduction to Graphs
Louis Davidson1
(1) Cleveland, TN, USA
Graph Fundamentals
While most readers will have the explicit goal of applying a graph to a
specific problem they are trying to solve, it is best to start at the
beginning and discuss what a graph is in the pure sense. (If you don’t
care, you can skip to Chapter 3 where the T-SQL code starts!) Going
through the fundamentals will help you free your mind from
preconceived notions. Perhaps, more importantly, you can ignore the
limitations of the tools and specific problem sets you may wish to solve
and just think about the whole problem set.
In math, there are the concepts of “pure” math and “applied” math.
Pure math is math for math’s sake. It is there to ask, “what can be done
with a construct?” without asking the limiting questions of, “is this
helpful code to solve my problem or, even more importantly, any
problem?” Applied math is more or less the sort of thing we typical
computer architects/programmers are typically interested in since we
have a problem and want to find something to help us with that
problem (and immediately so). Most of the time, solutions are only
interesting if they can solve the specific problems we currently know
about and have a manager riding on our back.
However, I always prefer to start just trying to see what I can
accomplish with a new tool before getting my hands dirty
(metaphorically, of course; as a programmer, my hands never get dirty
at work unless the day’s snack includes chocolate).
Some of the things I will discuss will then be familiar in
understanding what we might do when looking for patterns in the data
and eventually translating into algorithms. Some designs may not be
realistically possible due to current computing limitations in SQL
Server (or any graph database platform) or reasonable hardware
limitations at the time of writing in 2023. Having written my first book
in 2000, it astounds me how different this statement feels to me 23
years later sitting here with a computer on my desk that has more
power that medium to large corporations were running on when I first
wrote T-SQL.
In one of my sample databases you can download, I have millions of
rows in just one table, and I can process reasonable queries on my
desktop computer in mere minutes. Limitations always exist, but there
are fewer and fewer limitations for every generation of computer
architecture that passes.
My goal here in this first chapter (and to a large extent, the second
chapter) is to simply introduce some of the terms and concepts around
graphs to help you understand how graphs are shaped and, eventually,
processed.
Definition
Graphs are based on two primary data structures: nodes (or in math
terms, vertices), and edges. Nodes represent a thing that one might
care about, much like a table in a relational database. Edges establish a
connection between exactly one or two nodes (when the node count is
one, it means the node is related to itself.) A graph is defined as being a
set of nodes and a set of edges.
In a graph database, a node is like most any table with attributes
describing what the node represents. The edge is analogous to a many-
to-many relationship table with at least attributes to represent the
node that the relationship is from and which it is to. You will see that
two major things set these new concepts apart from a relational
implementation.
First, the from and the to in an edge generally can be from any node
object. Whereas a relational table column used to reference another
table value communicates that it is a foreign key from Table X and
nothing else, the from and to attributes of an edge can be from multiple
different node types if you so desire.
Tip I am fully aware that you can put any value into a column, so
every foreign key value in a column needn’t come from the same
table. But that is not how it should be done because data where a
column can mean multiple things is very confusing in a relational
table. Graph structures work very similar to how relational tables
work. Still, they have special properties that allow two rows to
contain data from multiple sources without confusing the
user/engine.
The two graph diagrams in Figure 1-4 are copies of the same graph.
The next concept is related, in that we will look at graphs with the same
shape but different nodes. When a graph has the same node and edge
shape (meaning the same nodes and edges, not diagram shape,) the
graphs are referred to as being isomorphic graphs. For example, the
two graphs in Figure 1-5 are not equal but they are isomorphic because
they have the same shape in their set of data (regardless of whether you
draw them as a square or not).
Figure 1-5 Two isomorphic graph structures
This concept of isomorphism will be, if not actually referred to using
the exact term, interesting in your usage of graphs on occasion.
Consider the set of nodes in Figure 1-6.
Now, consider that the nodes {Larry, The Who, Rolling Stones},
which form what is known as a subgraph (a set of nodes and edges
that is a part of a graph) of the more complete graph, are isomorphic to
{Fred, The Who, Rolling Stones}. It isn’t a difficult leap to see that since
Fred and Larry are both connected to two similar nodes, that an
additional edge {Fred, Beatles} might be a possibility for Fred. So, the
company may then wish to suggest “Have you heard of The Beatles (or
do you live under a rock)?” Of course, Fred may not be a fan of John,
Paul, George, and Ringo; but the goal of many graphs may be to look for
common traits and then suggest ways to make them more common.
Take this further and you may see patterns occurring from completely
different subgraphs that may indicate a repeatable pattern.
An important concept in subgraphs is a walk in a graph. This refers
to how you can traverse from node to node. For example, in the first
graph in Figure 1-6, starting at Larry you can find a walk from Larry ->
TheWho -> Rolling Stones -> Beatles -> Larry. You can also find many
other walks from any node to any node in this sample graph, since
every node in the graph is connected to every other node. Another term
for a walk is a path, which may be a bit more common since the
operator you will use in SQL Server 2019 and later to find a walk is
named SHORTEST_PATH.
You will often use this concept of a walk to determine the closest
one node is to another. For example, in the second variant of the graph,
the distance from Fred to The Who and Rolling Stones is 1 and to the
Beatles is 2. If you are a user of LinkedIn, you have seen this concept
when you see that you are a first-level connection or second or more to
other people. A second-level connection means you are connected
through one intermediate node.
A concept that is interesting if likely not particularly necessary in
programming typical graph structures is a Euler (pronounced Oiler)
Walk. An Euler Walk consists of the starting node being touched twice
and every edge in the graph touched exactly once. For example,
consider the graphs in Figure 1-7.
Figure 1-7 Graph diagrams to demonstrate walks
Realistically the limitation will come down to the thing that all
modeling decisions come to: semantics. What does the relationship
mean and how will it be used to mean one or more relationships? For
example, you might only logically have one edge between persons
indicating that person is a biological parent to a person; on the other
hand, multiple edges between person and movie could make sense: one
for actor, one for producer, one for director, and so on.
One last bit of graph theory I want to cover is that of a connected
graph. In a connected graph, there is a walk from every node in the
graph to every other node. The graph will be in one or more pieces in a
disconnected graph. In Figure 1-8, we have the simplest connected
graph with more than one node in 1-8A and the simplest disconnected
graph in 1-8B. The graph in 1-8A is said to be in one piece, and the
graph in 1-8B is in two pieces.
When you have a connected graphs and removing an edge will cause
it to be broken into more pieces, the edge is referred to as a bridge
edge. Consider the graph in Figure 1-10.
Figure 1-14 Graph examples to demonstrate processing acyclic and cyclic directed
graph
In Figure 1-14A, an acyclic graph, you can walk the graph from N1-
>N2->N3->N4 directly using a simple algorithm of going node to node,
touching all the nodes with no problem. But in Figure 1-14B, the
algorithm of going from node to node is made difficult by N1->N2->N3-
>N1, because what to do next? Cycle through the nodes again? Or stop
processing? All the results from starting at N1 would be repeated, but
what does this mean for the graph that you are modeling? You can
remember this mentally when manually tracing through, but the
programming gets more complex if you look at millions or billions of
nodes (the programming is not impossible, just more complicated).
This ability to be able to find a simple connection/walk from two
nodes in a graph is referred to as transitive closure. Once you have
hold of this power, you can do all forms of interesting things with
graphs, such as determining if you are connected to Kevin Bacon and
what is the shortest path through the people you are connected to that
you will need to bug to get tickets to see The Bacon Brothers in concert
when they come to your town.
In a more database design-oriented concern, consider what can be
modeled with an acyclic graph versus a cyclic one. More will be covered
in Chapter 2, but if you are modeling containership, like a bill of
materials (a data structure used in packaging/assembling items, where
products that are a part of other products are modeled using a graph), a
cycle in the graph could give you odd results. Consider the graph in
Figure 1-15.
Summary
In this chapter, the goal was to introduce some of the core concepts that
are used when discussing graph topics. Many of these topics may show
up all over the book, but also some may not. The goal of this chapter
was to briefly introduce concepts that will help you to envision what a
graph is and how a graph might be used as you start to solve complex
problems with graphs.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2023
L. Davidson, Practical Graph Structures in SQL Server and Azure SQL
https://doi.org/10.1007/978-1-4842-9459-8_2
Basic Implementation
As discussed in the previous chapter, the basic building blocks of a
graph are nodes and edges. A node is basically the same thing as any
relational database table representing some specific concept. An edge
represents a link between two rows in these tables, much like a typical
many-to-many resolution table does in a relational database.
So, say you have the graph shown in Figure 2-1.
Figure 2-1 Simple graph
You need a table to hold the N1 and N2 nodes. These nodes could be
of the same type or different types. For example, they could be one
person being the friend or family member of the other, or a person
being a fan of a certain football team. Let’s assume for this example that
they are the same type (and in the future, if I want them to be two
different types of objects, I will label the nodes as such).
For my examples, I will use data that would be at home in many
relational tables for most examples, simply because that is how you will
typically think of the data. In the following two chapters, I will establish
how SQL Server implements graph structures, but for now, just take the
data structures to be basically as you have built in SQL Server tables
before. To implement the structure in Figure 2-1, say you have the
following rows in a table to represent the two nodes:
Node
------
N1
N2
Now you need a data structure that represents the edge, like this:
FromNode ToNode
-------- -------
N1 N2
Acyclic Graphs
The easiest graphs to work with in a relational database are acyclic. The
reason comes down to the method used to process graphs in a
relational setting, the breadth-first algorithm (or relational
recursion). This algorithm was created for relational processing,
because the typical recursion used to process these data structures did
not fit well with the set-based nature of a relational database.
For example, consider the graph in Figure 2-2.
Figure 2-2 Sample graph structure
Processing this using a typical recursive manner, you choose your
starting point as N1 and then see if this node has one unprocessed
child. It does, so fetch N11. See if N11 has unprocessed children. Yes,
N111. N111 has no unprocessed child nodes, so whatever you are doing
with the data of the node, you add that to an output data structure.
Then step back up to N11 and get the next child node. And keep going.
This is referred to as a depth-first algorithm.
Common operations are counting nodes, summing sales from that
node, and so on. For your example, let’s just say you are counting child
nodes. So, you recurse back to N11 and add 1 to the child count. Then
you check for more child nodes, and you have more. Over and over. You
stop when every node has been processed in the subgraph that started
with your starting point.
This works great for certain kinds of programming languages but
terrible for relational ones where set-based processing is the clearly
desired method of programming. It works with sets of data, so that is
the kind of processing that has been devised for working with graphs in
the manner that relations engines work.
For a breadth-first algorithm, instead of digging down in the
structure, you take a starting point and then get all of the children of
that node. Then the children of those nodes, all at once. Taking that
same diagram, let’s break this down into a series of three queries on the
data, as seen in Figure 2-3.
Figure 2-3 Sample graph indicating the different levels that will be fetched in a
breadth-first query
You query for the starting point. In this case, it’s one node, but it
could be any number of nodes (in fact, that is the basis of some of the
code in this book to do things like starting at every node
simultaneously!). Your breadth-first algorithm is to query:
SELECT GraphId
FROM GraphObject
WHERE GraphId = @startingPoint --starting point =
N1
Trees
By far the most common graph that has been implemented in relational
databases for many years is a tree. A tree is a structure that requires
that every node have either zero or one parent, and no more. Consider a
real tree (or in the case of Figure 2-5, a glorious reproduction of a tree
from Disney’s Animal Kingdom theme park).
It has one trunk that goes into the ground. Either direction out
(down to the roots or up to the branches), you can see the analogy. Any
branch can be from the trunk or another branch, but it can only be one
of these. Branches don’t grow together and reform as one. (At least not
typically, and this isn’t botany class!) Nodes that do not have any child
nodes are referred to as leaf nodes, much like the leaves on the
branches stand alone.
My breadth-first example structure was a tree, reincluded as Figure
2-6.
Figure 2-6 Example tree repeated
To represent this in an adjacency list structure, you have rows like
From To
———- ———-
N1 N11
N11 N111
N11 N112
N11 N113
N1 N12
And so on. In order to make sure that the tree is always a tree, you
need to protect one main condition: unique to values. Since a child row
can only have one parent in a tree, having a uniqueness constraint on
that column of the adjacency list ensures it is a tree.
The other thing you typically need to do is include a constraint of
some sort to make sure that the from value does not equal the to
value. This is the only cycle that the uniqueness constraint will not stop
(though a duplicated to and from value basically makes the row a root
and a leaf and would likely be discovered quickly…but one of my mottos
is that bad data doesn’t happen if you don’t let it occur at all).
While all tree structures require a single root, a table structure such
as this could contain multiple tree structures. In some cases, a row with
NULL, N1 could be included in the structure as the starting point
when you create the tree. You can make sure there is only one using a
unique index, which only allows one NULL value. (SQL Server treats a
NULL value as distinct in indexes, unlike in comparisons, so that would
make sure that you had only one root node.). You could allow greater
than one root node by using a filtered UNIQUE index that ignores NULL
values. If you want to make sure the NULL row is never deleted, a
trigger object can be used, which is especially useful if you have users
that can delete rows in an ad-hoc manner.
If you need to model multiple tree structures, it is possible to just
create multiple edge objects, each with a distinct purpose. For example,
consider a company reporting structure. There are multiple projects
going on where a person is in the project management hierarchy, and
typically there is a hierarchy for dealing with HR type things. So, you
could create
And you could also create the same table again for every project.
Alternatively, you could model this as one structure that allows for
many trees to coincide in the same structure.
All the indexes discussed for the general tree structures would still
make sense, but you would include the HierarchyName in the object
because the uniqueness stands only for one project if an employee can
be on multiple projects.
It might just make sense to have the ManagementEdge as
modeled, but then have the project hierarchy be in its own
Another Random Document on
Scribd Without Any Related Topics
Bandoulière.
Bartizan.
Bombs.
Blocks and Tackles
Whip. Whip upon Whip.
Gun tackle.
Luff. Screw.
Runner.
Burton.
Bomb proof.
Back-plate.
Braquemart.
PLATE 4.
Castle.
1, moat; 2 drawbridge; 3, wicket; 4, sally-port; 5,
portcullis; 6, outer walls; 7, parapet; 8, rampart; 9,
loop-holes; 10, escutcheon; 11, bulwark; 12, sentinel;
13, magazine; 14, a cell; 15, donjon or keep; 16,
barracks; 17, barbacan; 18, watchman; 19, turret; 20,
chapel; 21, belfry; 22, state court; 23, merlons; 24,
embrasures.
Casemate.
Créneaux.
Caligae.
Colors.
Carbine, 17th century.
Diameter.
Coat of Mail.
Cheval de frise.
Cnémides.
Clunaculum.
Drum.
CASSE-TÊTE.
Cimeterre.
Decoration.
Dart.
Drawbridge.
Dangerous Space.
A, B, E, F, Trajectories.
(Laidley)
Dagger.
Donjon.
Espingole.
Epaulette.
Embrasures.
Daggers.
Ecu.
Echaugette.
Device.
Dagues.
PLATE 5.
Catapult.
Canteen.
Chapeau Bras.
Coat of Mail.
Cartridge-box.
Cartridge.
Chain shot.
Cuirass.
Carreau.
Cutlass.
Castellated.
Candjiar turc.
Casque.
Colletin.
Canister.
Caltrop.
Head-piece.
Crow’s Foot.