0% found this document useful (0 votes)

146 views21 pages

A Deep Dive Into Neo4j Link Prediction Pipeline and FastRP Embedding Algorithm

This document summarizes a blog post about optimizing link prediction models in Neo4j using the FastRP embedding algorithm. It discusses how the Neo4j link prediction pipeline works, focusing on using node embeddings to define features. It provides an example using character interaction data from Game of Thrones season 1. Node embeddings are generated using FastRP and analyzed using t-SNE visualization and cosine similarity, showing embeddings cluster nodes in the same communities.

Uploaded by

Mairely Mursuli Pereira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views21 pages

A Deep Dive Into Neo4j Link Prediction Pipeline and FastRP Embedding Algorithm

Uploaded by

Mairely Mursuli Pereira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Open in app Get started

Published in Towards Data Science

Tomaz Bratanic Follow

Oct 7, 2021 · 12 min read · Listen

Save

A Deep Dive into Neo4j Link Prediction Pipeline and FastRP

Embedding Algorithm
Learn how to train and optimize Link Prediction models in the Neo4j Graph Data Science
library to get the best results

In my previous blog post, I introduced the newly available Link Prediction pipeline in the Neo4j Graph Data Science
library. Since the post, I took more time to dig deeper and learn the inner workings of the pipeline. I’ve learned a
couple of things along the way that I want to share with you. At first, I intended to show how the Link Prediction
pipeline combines node properties to generate input features of the Link Prediction model. However, when I was
developing the content, I noticed a couple of insights about using the FastRP embedding algorithm. Therefore, by the
end of this blog post, you will hopefully learn more about the FastRP embedding model and how you can combine
multiple node features as an input to the Link Prediction model.

The code used in this post is available on GitHub.

Graph import

I had to find a small network so I easily visualize results as we go along. I decided to use the interaction network from
the first season of the Game of Thrones TV show made available by Andrew Beveridge.

Graph model. Image by the author.

The graph model consists of characters and their interactions. We will treat the interaction relationship as
undirected, where is character A interacts with character B, this directly implies that character
OpenBinalso
app interacted with
Get started

character A. We also know how many times two characters interacted, and we store that information as the
relationship property.

If you want to follow along with examples in this post, I recommend using a Blank project in Neo4j Sandbox. It is a
free cloud instance of Neo4j database that comes pre-installed with both APOC and Graph Data Science plugins.

The dataset is available on GitHub, so we can easily import it into Neo4j with the following Cypher query:

LOAD CSV WITH HEADERS FROM

"https://raw.githubusercontent.com/mathbeveridge/gameofthrones/master/data/got-s1-edges.csv"
as row

MERGE (s:Character{name:row.Source})

MERGE (t:Character{name:row.Target})

MERGE (s)-[i:INTERACTS]-(t)

SET i.weight = toInteger(row.Weight)

Link prediction pipeline

Under the hood, the link prediction model in Neo4j uses a logistic regression classifier. We are dealing with a binary
classification problem, where we want to predict if a link exists between a pair of nodes or not. On a high level, the
link prediction pipeline follows the following steps:

Image by the author.

In this post, we will focus on the first two steps, so let’s take a closer look at what is happening there.
Open in app Get started

Feature engineering for link prediction pipeline. Image by the author.

As a first step, you have to define node features. For example, you could use custom node properties such as age or
gender. You can also use graph algorithms such as PageRank or Betweenness centrality as initial node features. In
this blog post, we will start by using FastRP node embeddings to define initial node features. The nice thing about the
FastRP embedding algorithm is that it captures the network information and preserves the similarity in embedding
space between neighboring nodes that are close in a graph. At the moment, you can’t use pairwise information such
as the number of common neighbors or the length of the shortest path between a pair of nodes as input features.

In the second step, the link feature combiner creates a single feature from a pair of node properties. Currently, there
are three techniques that you can use to combine a pair of node properties into a single link feature vector:

Cosine distance

L2 or Euclidian distance

Hadamard product

In the above example, I have used the Hadamard product to combine a pair of node properties into a single link
feature vector. All the available link feature combiner techniques are order-invariant as the Link Prediction pipeline
supports predicting only undirected relationships at the moment. You can use multiple link feature combiners in a
single pipeline to define several feature vectors, which are then concatenated as an input to the Link Prediction
model. I’ll walk you through an example later in the post.

Once the node features and link feature combiner are defined, you can train the model to predict new connections.

Node Feature engineering

You can preprocess node features before defining the Link Prediction pipeline. You can also include them directly in
the pipeline definition if you only use graph algorithms such as node embeddings or centrality measures as node
features. In the first example, we will use the FastRP embeddings as our Link Prediction model node features.
Therefore, we could potentially include them in the pipeline definition. However, we will first do a short analysis of
Open in app Get started

the node embedding results, so we need to store the node embeddings to the graph before diving into the pipeline
definition. We start by projecting an undirected named graph. Take a look at the documentation for more
information about the inner workings of the Graph Data Science library.

CALL gds.graph.create('gots1', 'Character',

{INTERACTS:{orientation:'UNDIRECTED', properties:'weight'}})

We will use Louvain, a community detection algorithm, to help us better understand the results of the FastRP
embedding algorithm. You can use the following Cypher query to store the community structure information back to
the database.

CALL gds.louvain.write('gots1',

{writeProperty:'louvain', relationshipWeightProperty:'weight'})

Throughout this blog post, I will be using Neo4j Bloom to visualize the results of algorithms and link predictions.
Take a look at this guide if you want to learn how to visualize networks with Bloom.
Open in app Get started

Community structure of the Network. Nodes are colored based on the community they belong to. Image by the author.

Now we can go ahead and execute the FastRP embedding algorithm. The algorithm will produce an embedding or a
fixed-size vector for every node in the graph. My friend CJ Sullivan wrote an excellent article explaining the inner
working of the FastRP algorithm.

CALL gds.fastRP.write('gots1',

{writeProperty:'embedding', embeddingDimension:56, relationshipWeightProperty:'weight'})

First, we will evaluate FastRP embeddings with a t-SNE scatter plot visualization. The stored node embeddings are
vectors with a length of 56, as defined by the embeddingDimension parameter. The t-SNE algorithm is a
dimensionality reduction algorithm, which we can use to reduce the embedding dimension to two. Having vectors
with length two allows us to visualize them with a scatter plot. The Python code I used for dimensionality reduction
and scatter plot visualization is:
Open in app Get started

1 def tsne(embeddings, hue=None):

2 tsne = TSNE(n_components=2, n_iter=300)
3 tsne_results = tsne.fit_transform(embeddings['embedding'].to_list())
4
5 embeddings['tsne_x'] = [x[0] for x in list(tsne_results)]
6 embeddings['tsne_y'] = [x[1] for x in list(tsne_results)]
7
8 plt.figure(figsize=(16,10))
9 sns.scatterplot(
10 x="tsne_x", y="tsne_y",
11 hue=hue,
12 palette="deep",
13 data=embeddings,
14 legend="full",
15 alpha=0.9
16
17 tsne_input = run_query("""
18 MATCH (c:Character)
19 RETURN c.name as character, c.embedding as embedding, c.louvain as hue
20 """)
21
22 tsne(tsne_input, 'hue')
23

lp_combiner_tsne.py
hosted with ❤ by GitHub view raw

This code produces the following visualization:

Open in app Get started

t-SNE scatter plot visualization of FastRP embedding. Nodes are colored based on the community the belong to using
the Louvain algorithm. Image by the author.

FastRP embeddings and the Louvain algorithm were executed independently, and yet, we can observe that FastRP
embeddings cluster nodes in the same community close in the embedding space. This is no surprise as FastRP is a
community-based node embedding algorithm, meaning that nodes close in the graph will be also close in the
embedding space. Next, we will evaluate the cosine similarity between nodes in the graph.

MATCH (c:Character)

WITH {item:id(c), weights: c.embedding} AS userData

WITH collect(userData) AS data

CALL gds.alpha.similarity.cosine.stats({

data: data,

topK: 1000,

similarityCutoff: 0.1

})

YIELD nodes, similarityPairs, min, max, mean, p25, p50, p75, p90, p95, p99

RETURN nodes, similarityPairs, min, max, mean, p25, p50, p75, p90, p95, p99

Results

Search this file…

1 nodes similarityPairs min max mean p25 p50 p75 p90 p95 p99
2 126 13546 0.100346 1.000007 0.537947 0.281726 0.496877 0.816959 0.964294 0.984401 0.99615
cosine_similarity.csv
hosted with ❤ by GitHub Open in app Get started
view raw

An average cosine similarity coefficient between all nodes in the graph is around 0.5. Around 25% of the node pairs
have a cosine similarity greater than 0.81. Nodes are so similar in the embedding space because we have a tiny graph
of only 126 nodes. Next, we will evaluate the cosine and euclidian distance between pairs of nodes connected by a
relationship.

1 df = run_query("""
2 MATCH (c1:Character)-[:INTERACTS]->(c2:Character)
3 RETURN gds.alpha.similarity.euclideanDistance(c1.embedding, c2.embedding) AS distance, 'euclidian' as metric
4 UNION
5 MATCH (c1:Character)-[:INTERACTS]->(c2:Character)
6 RETURN gds.alpha.similarity.cosine(c1.embedding, c2.embedding) AS distance, 'cosine' as metric
7 """)
8
9 sns.displot(data=df, x='distance', col='metric')

lp_combiner_rel_distance.py
hosted with ❤ by GitHub view raw

Results

Most node pairs that are connected with a relationship have a high cosine similarity. Again, this is expected as the
FastRP is designed to translate the network topology structure into embedding space. Therefore, we expect the
neighboring nodes in the graph to be very similar in the embedding space. We will examine the node pairs connected
in the network with a cosine similarity of less than 0.5 as this is a bit more unexpected. First, we have to tag them
with Cypher:
Open in app Get started

MATCH p=(c1:Character)-[i:INTERACTS]->(c2:Character)

WHERE gds.alpha.similarity.cosine(c1.embedding, c2.embedding) < 0.5

SET i.show = True

And now, we can go ahead and visualize them with Neo4j Bloom.

Relationships between pairs of nodes that have cosine similarity of less than 0.5. Image by the author.

It seems that pairs of connected nodes with a lower cosine similarity mainly occur when we have connections
between various clusters or communities in the network. If you remember the t-SNE visualization, the nodes in the
same community are nicely grouped in the embedding space. However, we have a couple of relationships between
nodes from different communities. When we have connections between nodes from various communities, their
similarity decreases. It also seems that these nodes have a higher degree, meaning they have many links within their
community and then a couple of links to other communities. Therefore, they are more similar to neighbors within
Open in app Get started
their community and then less similar to neighbors from other clusters.
Link Feature Combiner
We’ve got the embeddings ready, and we know that pairs of connected nodes are highly likely to have a high cosine
similarity in the embedding space. Now, we will evaluate how different link feature combiners affect the output of
the link prediction model.

Cosine combiner

Interestingly enough, the first combiner we will take a look at is the Cosine similarity combiner.

Cosine link feature combiner. Image by the author.

The Link Feature combiner takes pairs of node features and combines them into a single link feature, which is then
used as training data to the logistic regression model that will predict new links. We have already done the cosine
similarity analysis, so we know that node pairs with a high cosine similarity are likely to be connected. Therefore, you
might imagine that new predicted links will be between pairs of not yet connected nodes with high cosine similarity,
as this is precisely how our training data looks.

The Python script we will be using to produce link predictions is:

1 def generate_links(combiner, predictedRelType):

2 # Delete all graph models & drop all named graphs
3 run_query("""
4 CALL gds.graph.list() YIELD graphName CALL gds.graph.drop(graphName) YIELD graphName as done
5 RETURN distinct 'dropped named graphs' as result
6 UNION
7 CALL gds.beta.model.list() YIELD modelInfo CALL gds.beta.model.drop(modelInfo.modelName) YIELD modelInfo as done
8 RETURN distinct 'dropped ML models' as result
9 """)
10 # Define a new LP pipeline
11 run_query("""CALL gds.alpha.ml.pipeline.linkPrediction.create('lp-pipeline')""")
12 # Define feature combiner
13 run_query(f"""
14 CALL gds.alpha.ml.pipeline.linkPrediction.addFeature('lp-pipeline', '{combiner}', {{ Open in app Get started
15 nodeProperties: ['embedding']}}) YIELD featureSteps;
16 """)
17 # Define train-test split
18 run_query("""
19 CALL gds.alpha.ml.pipeline.linkPrediction.configureSplit(
20 'lp-pipeline', {testFraction: 0.3, trainFraction: 0.6, validationFolds: 7})
21 YIELD splitConfig;
22 """)
23 # Configure LP model params
24 run_query("""
25 CALL gds.alpha.ml.pipeline.linkPrediction.configureParams('lp-pipeline',
26 [{penalty:0.001, tolerance: 0.01, maxEpochs: 500},
27 {penalty:0.01, tolerance: 0.01, maxEpochs: 500}]) YIELD parameterSpace;""")
28 # Construct named graph
29 run_query("""
30 CALL gds.graph.create('lp-graph',
31 'Character', {INTERACTS:{orientation:'UNDIRECTED'}}, {nodeProperties:'embedding'});
32 """)
33 # Train the model
34 run_query("""
35 CALL gds.alpha.ml.pipeline.linkPrediction.train('lp-graph',
36 {pipeline: 'lp-pipeline', modelName: 'lp-model', randomSeed: 42}) YIELD modelInfo
37 RETURN modelInfo.bestParameters AS winningModel, modelInfo.metrics.AUCPR.outerTrain AS trainGraphScore, modelInfo.metrics.A
38 """)
39 # Predict new relationships
40 run_query("""
41 CALL gds.alpha.ml.pipeline.linkPrediction.predict.mutate('lp-graph',
42 {modelName: 'lp-model', mutateRelationshipType: 'I', topN: 40, threshold: 0.45})
43 YIELD relationshipsWritten;
44 """)
45 # Store relationships back to graph
46 predicted_links_stats = run_query(f"""
47 CALL gds.graph.streamRelationshipProperty('lp-graph', 'probability', ['I'])
48 YIELD sourceNodeId, targetNodeId, propertyValue as probability
49 WHERE sourceNodeId < targetNodeId
50 MATCH (s),(t)
51 WHERE id(s)=sourceNodeId AND id(t)=targetNodeId
52 MERGE (s)-[:{predictedRelType}]-(t)
53 RETURN s.name as c1, t.name as c2,
54 gds.alpha.similarity.euclideanDistance(s.embedding, t.embedding) AS euclidian_similarity,
55 gds.alpha.similarity.cosine(s.embedding, t.embedding) AS cosine_similarity
56 """)
57 return predicted_links_stats

test_combiners.py
hosted with ❤ by GitHub view raw
Open in app Get started

It is a bit lengthy script as we need to define the entire Link Prediction pipeline for each link feature combiner option.
Finally, the predicted links are stored to Neo4j so that we can visualize them with Bloom. In my previous blog post, I
did a step-by-step explanation of the pipeline. As mentioned, I have prepared a Jupyter Notebook with all the code, so
you don’t have to copy it from the blog post directly.

Let’s examine the predicted links using the Cosine Link Feature combiner.
Open in app Get started
Top 20 predicted links using the Cosine Link Feature combiner. Image by the author.

The average Euclidian and Cosine similarity of node pairs that have predicted links is:

Search this file…

1 euclidian_similarity cosine_similarity
2 0.050814 0.999548

lp_combiner_cosine.csv
hosted with ❤ by GitHub view raw

As we might imagine, the cosine similarity between nodes of predicted links is on average 0.999. Thus, the results
agree with our training data. Furthermore, we can learn a bit about FastRP embeddings from the results. Peripheral
nodes with a low degree are very similar in the embedding space, even if they are not directly connected. For
example, the blue nodes in the bottom-left part of the visualizations are all very similar. All four nodes have only a
single relationship, and they share their only neighbor nodes.

L2 link feature combiner

Next, we will take a look at the L2 link feature combiner. L2 link feature combiner calculates the Euclidian distance
between two node features.
Open in app Get started

Top 20 predicted links using the L2 Link Feature combiner. Image by the author.

The results are almost identical to the Cosine Link Feature combiner. Seems that the FastRP embedding algorithm
optimizes both Cosine and Euclidian distance similarities between neighboring nodes in the network.

Hadamard link feature combiner

The Hadamard link feature combiner uses the Hadamard product to produce a link feature. The Hadamard product is
simply entrywise multiplication.
Open in app Get started

Hadamard link feature combiner. Image by the author.

Let’s examine the predicted links using the Hadamard link feature combiner.
Open in app Get started

Top 20 predicted links using the Hadamard Link Feature combiner. Image by the author.

The average Euclidian and Cosine similarity of node pairs that have predicted links is:

Search this file…

1 euclidian_similarity cosine_similarity
2 0.180325 0.995475

lp_combiner_hadamard.csv
hosted with ❤ by GitHub view raw

Some of the predicted links are similar to the Cosine and Euclidian link feature combiner results. For example, the
predicted links in the red community are almost identical. On the other hand, the model predicted more links in the
center instead of the periphery of the blue cluster.
Open in app Get started
Using multiple Link Feature Combiners
In the previous example, we have only used a single link feature combiner. In the case of Cosine or L2 link feature
combiner, we have effectively only used a single input feature to the logistic regression model. In practice, it makes
sense to have multiple input features that best describe your domain. As a demonstration, we will add the
Preferential attachment input as the second link feature. Looking at the documentation in Neo4j, the Preferential
attachment is defined as multiplying node degrees between a pair of nodes. In practice, the preferential attachment
model assumes that nodes with a higher node degree are more likely to form new connections. Unfortunately, we
can’t automatically add the preferential attachment link feature just yet, but we can add it manually. To add the
preferential attachment input feature, we will first calculate the node degree values for all nodes. You sometimes
want to normalize the input features with logistic regression models, so I will show you how to scale features directly
in the Link Prediction pipeline. Then we just need to add the Hadamard link feature combiner, which multiplies the
input matrices, in this case, node degrees. So if I understand the math correctly, the resulting link feature should
represent preferential attachment as we effectively multiply node degrees between pairs of nodes.

Using multiple link feature combiners in Link Prediction pipeline. Image by the author.

In the Link Prediction pipeline, you can have as many link feature combiners as you wish. The results of all link
feature combiners are then concatenated into a single vector that is used as an input to the Link Prediction logistic
Open in app Get started

regression model. We can add the degree calculation and scaling directly into the pipeline and don’t have to prepare
the degree features beforehand.

1 # Get the degree value of each node

2 run_query("""
3 CALL gds.alpha.ml.pipeline.linkPrediction.addNodeProperty('lp-pipeline', 'degree', {
4 mutateProperty: 'degree'
5 })""")
6
7 # Scale the degree using minmax scaler
8 run_query("""
9 CALL gds.alpha.ml.pipeline.linkPrediction.addNodeProperty('lp-pipeline', 'scaleProperties', {
10 nodeProperties: ['degree'],
11 mutateProperty: 'scaledDegree',
12 scaler:'MinMax'
13 })
14 """)
15 # Define HADAMARD combiner for node degree combiner
16 run_query("""
17 CALL gds.alpha.ml.pipeline.linkPrediction.addFeature('lp-pipeline', 'HADAMARD', {
18 nodeProperties: ['scaledDegree']
19 }) YIELD featureSteps;
20 """)

lp_combiner_pa.py
hosted with ❤ by GitHub view raw

We need to add the following three steps into our pipeline. The first query mutates the degree value to the named
graph. The second step will scale the degree value using the MinMax scaler and mutate it as the scaledDegree
property. Lastly, we use the Hadamard link feature combiner that combines node degrees. In the end, we’ll quickly
evaluate how adding a secondary link feature affects the link prediction results.

We get the following results using the Cosine link feature combiner with FastRP embeddings and Hadamard
combiner with scaled node degrees.
Open in app Get started

Image by the author.

Adding the secondary link feature shifted predicted links from the network periphery to the center. In some way, it
makes sense because the preferential attachment prefers links between nodes with a higher degree, and those nodes
are usually located in the center of the network.

We get the following results using the Euclidian link feature combiner with FastRP embeddings and Hadamard
combiner with scaled node degrees.

107

The secondary link feature didn’t really affect the results. I’ve also tried not scaling the node degree values and then
the predicted links shift strongly towards the center.
Open in app Get started

Using Hadamard link feature combiner without scaling node degrees. Image by the author.

If we don’t scale the node degrees before using the Hadamard combiner, the preferential attachment feature becomes
dominant. Therefore, the predicted links are between nodes with a high degree instead of the low degree nodes on
the periphery.

Conclusion
I have really enjoyed writing this blog post and learned a lot about FastRP embedding algorithm and Link Prediction
pipeline along the way. A quick summary would be:

FastRP is more likely to assign high similarity between neighboring nodes with a low degree
On the other hand, the cosine similarity between connected nodes of different communities could be lower than
0.5 Open in app Get started

Using multiple link feature combiners can help you better describe your domain

Scaling node features influences the result of Link Prediction logistic regression model

I encourage you to start a free Neo4j Sandbox project and start predicting new connections in your graphs. Let me
know how it goes!

As always, the code is available on GitHub.

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't
want to miss. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.

Get this newsletter

About Help Terms Privacy

Get the Medium app

Bruce and Borg S Psychosocial Frames of Reference
No ratings yet
Bruce and Borg S Psychosocial Frames of Reference
330 pages
A2002D10328392 Benson Gilbert Odo Week 7 - Assessment Point 2
100% (1)
A2002D10328392 Benson Gilbert Odo Week 7 - Assessment Point 2
17 pages
HEC Recognized Journals: Background
100% (1)
HEC Recognized Journals: Background
11 pages
Graph Data Science For Dummies Neo4j 2nd Edition
No ratings yet
Graph Data Science For Dummies Neo4j 2nd Edition
53 pages
Applications of Regression Analysis by Using R Programming - Tara Qasm Bakr
0% (1)
Applications of Regression Analysis by Using R Programming - Tara Qasm Bakr
29 pages
Competitive Learning Neural Network
No ratings yet
Competitive Learning Neural Network
62 pages
Considerations For Using NoSQL Technology On Your Next IT Project
No ratings yet
Considerations For Using NoSQL Technology On Your Next IT Project
398 pages
Implementation of A Chatbot For Helping Users Find The Nearest and Cheapest Gas Station
No ratings yet
Implementation of A Chatbot For Helping Users Find The Nearest and Cheapest Gas Station
59 pages
Brain Regions That Represent Amodal Conceptual Knowledge: Behavioral/Cognitive
No ratings yet
Brain Regions That Represent Amodal Conceptual Knowledge: Behavioral/Cognitive
7 pages
LECTURER 2 Project Management Process
100% (1)
LECTURER 2 Project Management Process
20 pages
S I: T N G S M: Imulation Ntelligence Owards A EW Eneration OF Cientific Ethods
No ratings yet
S I: T N G S M: Imulation Ntelligence Owards A EW Eneration OF Cientific Ethods
109 pages
MC4411 Project Work - Format
No ratings yet
MC4411 Project Work - Format
65 pages
Existence of Matchings
No ratings yet
Existence of Matchings
15 pages
00 - Introduction (Read ME!!!)
No ratings yet
00 - Introduction (Read ME!!!)
50 pages
Physical Features Based Speech Emotion Recognition Using Predictive Classification
No ratings yet
Physical Features Based Speech Emotion Recognition Using Predictive Classification
12 pages
Emotion and Depression Detection From Speech
No ratings yet
Emotion and Depression Detection From Speech
9 pages
Socialantropologi Grundkurs 1 - Artikel
No ratings yet
Socialantropologi Grundkurs 1 - Artikel
22 pages
Graph Databases: Girish Jayappa
No ratings yet
Graph Databases: Girish Jayappa
23 pages
Thesis 2021 - Bayesian - Neural - Networks
No ratings yet
Thesis 2021 - Bayesian - Neural - Networks
148 pages
Lec02 Python Basics
No ratings yet
Lec02 Python Basics
84 pages
Master in Business For Architecture and Design
No ratings yet
Master in Business For Architecture and Design
27 pages
HBN 12 Supl4 PDF
No ratings yet
HBN 12 Supl4 PDF
76 pages
3819-Article Text-6877-1-10-20190701
No ratings yet
3819-Article Text-6877-1-10-20190701
8 pages
Axis I - Decision Tree
50% (2)
Axis I - Decision Tree
1 page
Data Sheet: CB-12-POI-DF-48
100% (1)
Data Sheet: CB-12-POI-DF-48
3 pages
Modeling A Recommendation Engine Workshop
No ratings yet
Modeling A Recommendation Engine Workshop
94 pages
ADIT TP 2023-06 Questions
100% (1)
ADIT TP 2023-06 Questions
6 pages
Banking Finance Tax Test SK2019 - 1
No ratings yet
Banking Finance Tax Test SK2019 - 1
4 pages
Case Digest, Self Executing Provision
100% (1)
Case Digest, Self Executing Provision
2 pages
Generic ISO 14001 EMS Templates: ACT Plan
No ratings yet
Generic ISO 14001 EMS Templates: ACT Plan
59 pages
The Rapid Application and Database Development RAD
No ratings yet
The Rapid Application and Database Development RAD
16 pages
Automatic Vision System Via Image Processing Final
No ratings yet
Automatic Vision System Via Image Processing Final
66 pages
Guia Seleccion Motores 20 HP Siemenes
No ratings yet
Guia Seleccion Motores 20 HP Siemenes
212 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Paper - Mine Waste 1
No ratings yet
Paper - Mine Waste 1
3 pages
Polypac (All Grades) Sds
No ratings yet
Polypac (All Grades) Sds
10 pages
Macroeconomics 2Nd Edition Krugman Solutions Manual Full Chapter PDF
100% (11)
Macroeconomics 2Nd Edition Krugman Solutions Manual Full Chapter PDF
34 pages
A Review & Elucidation of Pamela Hieronymi's Responsibility For Believing' - Allen
No ratings yet
A Review & Elucidation of Pamela Hieronymi's Responsibility For Believing' - Allen
12 pages
General: SAP ERP, Private Edition Tailored Option Service Description Guide
No ratings yet
General: SAP ERP, Private Edition Tailored Option Service Description Guide
58 pages
DL
No ratings yet
DL
9 pages
Ic33 Print Out 660 English PDF
100% (1)
Ic33 Print Out 660 English PDF
54 pages
Spencer: Privacy and Predictive Analytics in E-Commerce
No ratings yet
Spencer: Privacy and Predictive Analytics in E-Commerce
19 pages
Haresh Kumar
No ratings yet
Haresh Kumar
7 pages
R-C101C Manual EU Verision
No ratings yet
R-C101C Manual EU Verision
44 pages
Conservation Equations and Modeling of Chemical and Biochemical Processes 1st Edition Said S.E.H. Elnashaie Download
No ratings yet
Conservation Equations and Modeling of Chemical and Biochemical Processes 1st Edition Said S.E.H. Elnashaie Download
63 pages
Introduction For Term Paper Sample
100% (1)
Introduction For Term Paper Sample
4 pages
Final Year Report Submitted
No ratings yet
Final Year Report Submitted
61 pages
HTML Tables and Forms (PDFDrive)
100% (1)
HTML Tables and Forms (PDFDrive)
68 pages
Review On Topic Detection Methods For Twitter Streams
No ratings yet
Review On Topic Detection Methods For Twitter Streams
5 pages
The Citation Process by Blaise Cronin PDF
No ratings yet
The Citation Process by Blaise Cronin PDF
107 pages
Understanding and Coding Neural Networks From Scratch in Python and R
100% (1)
Understanding and Coding Neural Networks From Scratch in Python and R
15 pages
Tonoyan Et Al-2010-Entrepreneurship Theory and Practice
No ratings yet
Tonoyan Et Al-2010-Entrepreneurship Theory and Practice
40 pages
Neural Networks
No ratings yet
Neural Networks
13 pages
The Psychology of The Internet. by Patricia Wallace
No ratings yet
The Psychology of The Internet. by Patricia Wallace
6 pages
Data Versioning For Graph Databases
No ratings yet
Data Versioning For Graph Databases
71 pages
Action Recognition
No ratings yet
Action Recognition
14 pages
LaBorda ENG
No ratings yet
LaBorda ENG
20 pages
Web and Text Mining
No ratings yet
Web and Text Mining
73 pages
Parallel Computing With Julia
No ratings yet
Parallel Computing With Julia
87 pages
Neo4j Cypher Refcard 4.4
No ratings yet
Neo4j Cypher Refcard 4.4
19 pages
Neo4j-Admin Import
No ratings yet
Neo4j-Admin Import
21 pages
Paper Thermofluid
No ratings yet
Paper Thermofluid
16 pages
Neural Network and Fuzzy Logic
No ratings yet
Neural Network and Fuzzy Logic
46 pages
Behavioral Model Analysis
No ratings yet
Behavioral Model Analysis
0 pages
Mixture of Link Predictors On Graphs
No ratings yet
Mixture of Link Predictors On Graphs
28 pages
A Universal Prompt Generator For Large Language Models
No ratings yet
A Universal Prompt Generator For Large Language Models
10 pages
Report (Group-28)
No ratings yet
Report (Group-28)
56 pages
Deep Learning Introduction Unit 1
No ratings yet
Deep Learning Introduction Unit 1
21 pages
Cassandra Vs MongoDB Vs CouchDB Vs Redis Vs Riak Vs HBase Vs Couchbase Vs Hypertable Vs ElasticSearch Vs Accumulo Vs VoltDB Vs Scalaris Comparison - Software Architect Kristof Kovacs
No ratings yet
Cassandra Vs MongoDB Vs CouchDB Vs Redis Vs Riak Vs HBase Vs Couchbase Vs Hypertable Vs ElasticSearch Vs Accumulo Vs VoltDB Vs Scalaris Comparison - Software Architect Kristof Kovacs
11 pages
Hillstone HSM 4.0.0 EN
No ratings yet
Hillstone HSM 4.0.0 EN
2 pages
Kafka & Redis For Big Data Solutions: Christopher Curtin Head of Technical Research @chriscurtin
No ratings yet
Kafka & Redis For Big Data Solutions: Christopher Curtin Head of Technical Research @chriscurtin
43 pages
CSC103 - Subjective Part (Fall 2020) 1
No ratings yet
CSC103 - Subjective Part (Fall 2020) 1
9 pages
MM Chi Square Lab
No ratings yet
MM Chi Square Lab
4 pages
Mit Data Science Program
No ratings yet
Mit Data Science Program
16 pages
Neo4j Cypher Refcard Stable
No ratings yet
Neo4j Cypher Refcard Stable
2 pages
The Complete Cypher Cheat Sheet
No ratings yet
The Complete Cypher Cheat Sheet
10 pages
Python Chapter 04 While Loop Notes
No ratings yet
Python Chapter 04 While Loop Notes
21 pages
Article VI Reviewer
No ratings yet
Article VI Reviewer
7 pages
6302-Strategic Analysis
No ratings yet
6302-Strategic Analysis
10 pages
A Study On Job Satisfaction of Employees at
No ratings yet
A Study On Job Satisfaction of Employees at
6 pages
PDF
No ratings yet
PDF
442 pages
MongoDB Mongoosess
No ratings yet
MongoDB Mongoosess
31 pages
Data Science Engineering Full Time Program Brochure
No ratings yet
Data Science Engineering Full Time Program Brochure
21 pages
Shell Leverages Data To Transform From Reactive To Predictive Operations
No ratings yet
Shell Leverages Data To Transform From Reactive To Predictive Operations
6 pages
Data Driven Control
No ratings yet
Data Driven Control
6 pages
Graph Analytics PDF
No ratings yet
Graph Analytics PDF
13 pages
Iit Data Science
No ratings yet
Iit Data Science
20 pages
Ahu Fan-6000 CFM Twin Fan-1.5 Inch
No ratings yet
Ahu Fan-6000 CFM Twin Fan-1.5 Inch
1 page
12th History EM - June Month Test
No ratings yet
12th History EM - June Month Test
2 pages
Convolutional Neural Networks For Visual Recognition
No ratings yet
Convolutional Neural Networks For Visual Recognition
45 pages
Secure Data Exchange Via HSM
No ratings yet
Secure Data Exchange Via HSM
3 pages
Economics Enrichment Plan UPSC
No ratings yet
Economics Enrichment Plan UPSC
2 pages
Solution 2
0% (1)
Solution 2
4 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet

A Deep Dive Into Neo4j Link Prediction Pipeline and FastRP Embedding Algorithm

Uploaded by

A Deep Dive Into Neo4j Link Prediction Pipeline and FastRP Embedding Algorithm

Uploaded by

Open in app Get started

Published in Towards Data Science

Tomaz Bratanic Follow

Oct 7, 2021 · 12 min read · Listen

A Deep Dive into Neo4j Link Prediction Pipeline and FastRP

The code used in this post is available on GitHub.

Graph model. Image by the author.

LOAD CSV WITH HEADERS FROM

SET i.weight = toInteger(row.Weight)

Link prediction pipeline

Image by the author.

Feature engineering for link prediction pipeline. Image by the author.

Node Feature engineering

CALL gds.graph.create('gots1', 'Character',

{writeProperty:'embedding', embeddingDimension:56, relationshipWeightProperty:'weight'})

1 def tsne(embeddings, hue=None):

This code produces the following visualization:

WITH {item:id(c), weights: c.embedding} AS userData

WITH collect(userData) AS data

Search this file…

WHERE gds.alpha.similarity.cosine(c1.embedding, c2.embedding) < 0.5

SET i.show = True

Cosine link feature combiner. Image by the author.

The Python script we will be using to produce link predictions is:

1 def generate_links(combiner, predictedRelType):

Search this file…

L2 link feature combiner

Hadamard link feature combiner

Hadamard link feature combiner. Image by the author.

Search this file…

1 # Get the degree value of each node

Image by the author.

As always, the code is available on GitHub.

Sign up for The Variable

Get this newsletter

About Help Terms Privacy

Get the Medium app

You might also like