Graph Data Science Basics: Everyone Should Know
Graph Data Science Basics: Everyone Should Know
GRAPH DATA
SCIENCE BASICS
EVERYONE
SHOULD KNOW
5 Graph Data Science Basics Everyone Should Know
1. What is a graph?
Before you can understand graph data science, you need to understand graphs. At its most
fundamental, a graph is simply a different way of structuring data. Instead of rows and columns,
like in a traditional, relational database table or dataframe, graphs use nodes and relationships as
their primary structure.
NS
OW
Relationships
ES
(edges/links) connect
nodes to each other
Car
Properties describe
Brand Volvo
Model V70 a node or relationship,
for example, name, age,
height, and so on.
In a graph, nouns – people, places, things, organizations – are nodes. The relationships between
them are verbs: friends, works for, likes, and so on.
2
5 Graph Data Science Basics Everyone Should Know
Graph data science brings together graph statistics, analytics, and ML to put data in context and
answer pressing questions.
Graph statistics, queries, and visualization drive exploration and insights. Graph statistics
provide basic measures about a graph, such as the number of nodes and the distribution of
relationships. Graph queries answer any question, no matter how deep, whether 6 or 600
degrees of separation. Graph visualization empowers data experts to see their data and
explore patterns that bear further investigation.
Graph analytics builds on graph statistics by answering specific questions and gaining
insights from connections in existing or historical data. Graph queries and algorithms are
typically applied together in “recipes” during graph analytics, and the results are used
directly for analysis.
Graph-enhanced ML is the application of graph data and analytics results to train ML models
or support probabilistic decisions within an AI system. Graph statistics and analytics are
often used in conjunction to answer certain types of questions about complex systems, and
the subsequent insights are applied to improve ML.
3
5 Graph Data Science Basics Everyone Should Know
Link prediction fills in the blanks Node embedding transforms the Node classification models
in your data and predicts changes in topology and features of your graph into predict the class of nodes in your graph.
your graph’s structure. Link prediction a low-dimensional vector representation A class can be a binary indicator, like
is a common machine learning task of each node. These vectors, also called whether a user account is engaged in
applied to graphs: training a model to embeddings, can be used for exploratory fraud, or a multivalued indicator, such
learn where relationships should exist data analysis, similarity measurements, as which market segment a customer
between pairs of nodes in a graph. You and ML. Node embeddings can aggre- belongs to.
can think of link prediction as building a gate information about a node’s position
model to pinpoint missing relationships in the graph, its local neighbors, its You can train node classification models
in your dataset or predict relationships centrality and influence, and in some to predict which class new and existing
that are likely to form in the future. cases, other numeric node properties. nodes belong to with a broad range
With graph data science, you can train of input features, using the network
supervised ML models based on the structure of your graph and properties
relationships and node properties in your from your source data.
graph to predict the existence – and
probability – of relationships.
Which parts of my
graph are connected to Which nodes are Where will connections What‘s the label
each other? most similar? form next? for this node?
• Centrality
• Embeddings
4
5 Graph Data Science Basics Everyone Should Know
5
5 Graph Data Science Basics Everyone Should Know
6
5 Graph Data Science Basics Everyone Should Know
Data scientists are typically the primary users of graph data science
tools because they are practitioners of data science with deep
knowledge of algorithms and models.
7
5 Graph Data Science Basics Everyone Should Know
Organizations of all sizes, all industries, and within all departments are using graph data science
to make recommendations, identify anomalies and find fraudsters, improve customer knowledge,
and optimize supply chains.
ebook
ebook
ebook
ebook
Neo4j is the world’s leading graph data platform. We help organizations – including Comcast, ICIJ, NASA, UBS, and Questions about
Volvo Cars – capture the rich context of the real world that exists in their data to solve challenges of any size
and scale. Our customers transform their industries by curbing financial fraud and cybercrime, optimizing global Neo4j? Contact us
networks, accelerating breakthrough research, and providing better recommendations. Neo4j delivers real-time around the globe:
transaction processing, advanced AI/ML, intuitive data visualization, and more. Find us at neo4j.com and follow us
at @Neo4j.
[email protected]
© 2022 Neo4j, Inc. neo4j.com/contact-us