0% found this document useful (0 votes)
87 views8 pages

Graph Data Science Basics: Everyone Should Know

5-Graph-Data-Science-Basics-Everyone-Should-Know

Uploaded by

Kaisa Varonen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views8 pages

Graph Data Science Basics: Everyone Should Know

5-Graph-Data-Science-Basics-Everyone-Should-Know

Uploaded by

Kaisa Varonen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

5

GRAPH DATA
SCIENCE BASICS
EVERYONE
SHOULD KNOW
5 Graph Data Science Basics Everyone Should Know

Why Graph Data Science and Why Now?


Commercial applications of graph data science are new and most data experts are still coming
up to speed on how to best use it in their organizations. While some data experts studied graph
theory in college, others have not had much exposure to it. Graph data science brings together
graph analytics, statistics, and AI and ML techniques to improve their predictive and prescriptive
models. This paper walks you through the basics of graph data science so you will feel confident
knowing when to use it in your daily work.

1. What is a graph?
Before you can understand graph data science, you need to understand graphs. At its most
fundamental, a graph is simply a different way of structuring data. Instead of rows and columns,
like in a traditional, relational database table or dataframe, graphs use nodes and relationships as
their primary structure.

Name Del Name Ash


Born May 29, 1970 Born Dec 5, 1975 Graphs represent data
Twitter @del
Del LOVES Ash via relationships.
LOVES

LIVES WITH Nodes represent


an entity in
the graph
DR

NS

since Jan 10, 2011


IV

OW

Relationships
ES

(edges/links) connect
nodes to each other
Car
Properties describe
Brand Volvo
Model V70 a node or relationship,
for example, name, age,
height, and so on.

In a graph, nouns – people, places, things, organizations – are nodes. The relationships between
them are verbs: friends, works for, likes, and so on.

2
5 Graph Data Science Basics Everyone Should Know

2. How is graph data science different


from traditional analytics?
When you analyze data in tabular form, like in a relational database, you try to make sense of
data points without a coherent way to analyze their connections. Ben Squire, a data scientist,
compared traditional methods to “trying to solve a Rubik’s Cube by only looking at one side.” A
graph gives data experts the ability to look at, understand, and analyze the connections between
each data point. This gives data context that is impossible to get from a tabular data model. You
can see how strong connections are, where groups of connections form, how important each
connection is, and how connections influence one another.

Graph data science brings together graph statistics, analytics, and ML to put data in context and
answer pressing questions.

Graph statistics, queries, and visualization drive exploration and insights. Graph statistics
provide basic measures about a graph, such as the number of nodes and the distribution of
relationships. Graph queries answer any question, no matter how deep, whether 6 or 600
degrees of separation. Graph visualization empowers data experts to see their data and
explore patterns that bear further investigation.

Graph analytics builds on graph statistics by answering specific questions and gaining
insights from connections in existing or historical data. Graph queries and algorithms are
typically applied together in “recipes” during graph analytics, and the results are used
directly for analysis.

Graph-enhanced ML is the application of graph data and analytics results to train ML models
or support probabilistic decisions within an AI system. Graph statistics and analytics are
often used in conjunction to answer certain types of questions about complex systems, and
the subsequent insights are applied to improve ML.

3
5 Graph Data Science Basics Everyone Should Know

Three in-graph ML techniques you should know about

Link prediction fills in the blanks Node embedding transforms the Node classification models
in your data and predicts changes in topology and features of your graph into predict the class of nodes in your graph.
your graph’s structure. Link prediction a low-dimensional vector representation A class can be a binary indicator, like
is a common machine learning task of each node. These vectors, also called whether a user account is engaged in
applied to graphs: training a model to embeddings, can be used for exploratory fraud, or a multivalued indicator, such
learn where relationships should exist data analysis, similarity measurements, as which market segment a customer
between pairs of nodes in a graph. You and ML. Node embeddings can aggre- belongs to.
can think of link prediction as building a gate information about a node’s position
model to pinpoint missing relationships in the graph, its local neighbors, its You can train node classification models
in your dataset or predict relationships centrality and influence, and in some to predict which class new and existing
that are likely to form in the future. cases, other numeric node properties. nodes belong to with a broad range
With graph data science, you can train of input features, using the network
supervised ML models based on the structure of your graph and properties
relationships and node properties in your from your source data.
graph to predict the existence – and
probability – of relationships.

Graph Data Science

Unsupervised Graph Algorithms Supervised Machine Learning

Which parts of my
graph are connected to Which nodes are Where will connections What‘s the label
each other? most similar? form next? for this node?

Clustering Association Link Node


Prediction Classification
• Community • Similarity
Detection • Pathfinding

How important What‘s the missing


is each node? property value?

Dimension Reduction Property


(generalization) Regression

• Centrality
• Embeddings

4
5 Graph Data Science Basics Everyone Should Know

3. Why do graphs matter?


Across an organization, every department can benefit from Customer 360
graph data science to answer questions like who or what is Across the globe, businesses try to better understand their
important, what should I do next, and what’s unusual about customers and improve customer lifetime value (LTV). With
this? graph data science, customer knowledge can become more
Marketing accurate and complete through entity resolution. This process
Ops looks at all the database entries and identifies duplicates.
• Customer
360 • Product Creating a complete, master database entry for each customer
• Influencer Development
Finance Strategy
instead of having multiple, incomplete entries improves LTV
• Pipeline
• Campaign Acceleration and deepens customer knowledge, allowing for optimized
• Fraud Detection Optimization • Supply Chain
• Pricing Analysis • Product
marketing programs and offers.
Optimization
• Budgeting Recommen- • Infrastructure
• Forecasting dations Planning
Those using graph data science for customer 360 can
increase customer knowledge by 30%.
IT HR
Recommendation engines
• Network • Training
Monitoring • Upskilling & Recommendation engines became well known through Netflix
• Cybersecurity Retention and online shopping experiences. However, recommendation
• DevOps • Promotions
engines have uses across the business. From product
development to human resources for retaining employees
While use cases for graph data science span industries and through upskilling training, recommendation engines power
lines of business – from life sciences to manufacturing – there some of the most important parts of a business.
are four use cases rapidly becoming the most popular among
data scientists. Supply chain management
Improving a supply chain leads to savings, not just in dollars,
Anomaly and fraud detection but also in carbon emissions. Every optimized route, perfect
Anomaly detection across corporate networks can help timing, and perfect delivery means happier customers and less
to identify cybersecurity attacks and prevent data loss. waste across time, infrastructure, and emissions. Graph data
The same strategy used to identify threat actors in a science helps optimize supply chain routes by finding the best
cybersecurity context can be used to detect fraud in banking, path, balancing cost and efficiency with customer satisfaction
insurance, and government programs by analyzing the and sustainability.
relationships and behaviors in your graph.
An ROI of tens of millions for one organization using
Those using graph data science to curb fraud have seen graph data science for route optimization is nothing
detection improvements of over 300%, saving millions compared to the 60,000 tons of carbon emissions
every year. eliminated by using those optimized routes.

5
5 Graph Data Science Basics Everyone Should Know

4. What are the big questions


graph data science helps answer?
Graph data science helps you answer key questions to make critical business decisions. If you
hear questions like these, you may benefit from using graph data science to answer them:

What’s unusual? What should I recommend?


To understand anomalies and hidden patterns in your To build a recommendation engine, consider using
graph, consider using community detection. Communities similarity. Similarity identifies repeating patterns in your
are clusters within your graph, and community detection graph. Similarity algorithms employ set and distance
algorithms can be used to discover and identify these comparisons to score how alike individual nodes are based
clusters. Detecting communities helps you uncover on their neighbors or properties. This approach is used in
unusual patterns, predict similar behavior, find duplicate applications such as personalized recommendations and
entities, or simply prepare data for other analyses. developing categorical hierarchies.

What content is the most important? What is the optimal route?


Who is the most influential? To understand route optimization, use pathfinding.
Where is our biggest risk? Pathfinding algorithms find the best routes across your
To understand what’s important, use centrality algorithms. connected data. Pathfinding algorithms are foundational to
Centrality metrics like PageRank help you identify what’s graph analytics and find the most efficient or shortest paths
important. Centrality algorithms reveal which nodes to traverse between nodes. They can be used to understand
are important based on graph topology. They identify complex dependencies and evaluate routes for uses such as
influential nodes based on their position in the larger physical logistics and least-cost call or IP routing.
network, including their connections. These algorithms
are used to infer group dynamics such as credibility,
cascading vulnerability, and bridges between groups.

6
5 Graph Data Science Basics Everyone Should Know

5. Who uses graph data science tools?


Anyone who works with data can take advantage of graph data science to find answers to
difficult questions. Here are some of the roles currently using graph data science.

Data scientists are typically the primary users of graph data science
tools because they are practitioners of data science with deep
knowledge of algorithms and models.

Machine learning engineers work to scale, improve, embed,


integrate, or operate machine learning models that are developed
by data scientists.

Data engineers identify trends in datasets, create data pipelines,


and leverage graph algorithms to transform and enrich graph data
to make it more useful to the organization and to other data experts.

Citizen data scientists apply graph data science techniques


without the need for deep analytics and ML expertise,
using a low-code or no-code experience.

Business data analysts visualize, build upon, and report on the


analysis done by a data scientist for business users.

Application developers and software architects who are learning


about graph databases use graph queries, typically in
Cypher query language. They learn graph algorithms as they seek
to analyze patterns across their graph and use those techniques
in their applications. (Graph data science is often the
secret sauce in differentiating applications.)

7
5 Graph Data Science Basics Everyone Should Know

Stay Ahead of the Curve


Commercial applications of graph data science are new, and data experts are still coming up to
speed on how to best use it in their organizations. While considering use cases, data experts
and data scientists should remember that graph data science helps answer big questions like
what’s important, what’s unusual, and what’s next. Using this framing, it is easier to identify
opportunities to use graph data science to improve models and make predictions.

Organizations of all sizes, all industries, and within all departments are using graph data science
to make recommendations, identify anomalies and find fraudsters, improve customer knowledge,
and optimize supply chains.

So what will you do with graph data science?


Ready to try it out? Activate a free graph data science sandbox from Neo4j with prepopulated
data for common scenarios or read about popular use cases.

Build Recommendation Engines

ebook

Detect Fraud and Anomalies

ebook

Improve Customer Knowledge

ebook

Optimize Supply Chains

ebook

Neo4j is the world’s leading graph data platform. We help organizations – including Comcast, ICIJ, NASA, UBS, and Questions about
Volvo Cars – capture the rich context of the real world that exists in their data to solve challenges of any size
and scale. Our customers transform their industries by curbing financial fraud and cybercrime, optimizing global Neo4j? Contact us
networks, accelerating breakthrough research, and providing better recommendations. Neo4j delivers real-time around the globe:
transaction processing, advanced AI/ML, intuitive data visualization, and more. Find us at neo4j.com and follow us
at @Neo4j.
[email protected]
© 2022 Neo4j, Inc. neo4j.com/contact-us

You might also like