0% found this document useful (0 votes)
23 views20 pages

Learning Graph DB in One Night - Neo4j - by Prashant Mudgal - Towards Data Science

Uploaded by

eaintkyawthmu1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views20 pages

Learning Graph DB in One Night - Neo4j - by Prashant Mudgal - Towards Data Science

Uploaded by

eaintkyawthmu1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Member-only story

Learning Graph DB in one night —


Neo4j
I had 1 night to decide whether spending a large amount of time on
Graph database will be fruitful.

Prashant Mudgal · Follow


Published in Towards Data Science · 7 min read · Dec 23, 2020

71

I spent a large amount of time last year on developing a recommendation


system for a Telecom client’s users; It turned out to be a massively difficult
problem to undertake and accomplish in a short stipulated time. I was faced
with a similar-sized problem last week and had a short quick around time for
devising an initial strategy. I was well aware of the landmines that the data-
driven methods have so I wanted to test another approach.
Image by Author( One can create beautiful art with context-free grammar). I chose this photograph because it
gels well with the graphs, networks, and hallucinations of the night!

Last year, someone had mentioned Neo4j DB for Recommendation System


but I didn’t pay much heed to it. I heard about Neo4j for the first time in 2016
when I downloaded it along with the Panama Papers data to expose the
shameless tax avoiders and owners of the overseas shelf companies from my
country. I ran a query here and there for an hour and a half while sitting at

Open in app
Open in app
Cafe Jax on 84th St. but eventually, it met the fate of most hobby projects —
swept underSearch
the carpet in an oblivious dimension. Write

Fast forward to propitious 2020, I decided to delve a little into Graph


databases with the following factors to consider:

1. Run a truncated business problem rather than some tutorial.

2. Can I get meaningful information in a short amount of time?

3. How scalabale this entire thing is on my moderately powerful machine?

4. How flexible it is in comparison to the pythonic approach? Data manipulation,


feature generation etc

I started at around 7:30 pm, shortly after dinner and finished at around 5:20
am. I relied heavily on the docs and examples on the documentation
website.

1. Download and Install


https://neo4j.com/download/ is where one can find the installer; one has to
fill a small form before it lets us proceed to download(another data
collection ploy).

After installation, you will be welcomed with open arms in the Neo4j
community(at least that’s what the prompts on the screens say) and if you
don’t have time for faffing around, you will diligently close all such
paraphernalia and get to business straight away.

2. Set up a Graph DB
I am sure you can create a new Project and in that, you will have to create a
database. Click on Add Database

Image by Author

Image by Author
Give a username and password of your choice and click on Start. Once it is
running, click on Open.

3. Data prep and Data location


First of all where to place your data? If you are using macOS then
/Users/<your user folder>/Library/Application
Support/com.Neo4j.Relate/Data/dbmss/<folder related to the DB you created
above>/import/

Place your .csv files in the import folder.

<folder related to the DB you created above> — If it is your first project then
you will have only one folder under /dbmss, so place your .csv in the /import
there nonchalantly and audaciously.

(Only for mac users: The above folder is much easier to find on Windows or
Linux as in macOS the /Users/<your user folder>/Library is hidden, so you
can type /Users/<your user folder>/Library in spotlight search and get to the
folders)

I scrubbed my data heavily and took only 1% of it for the experiment.

You can get all the .csv files from the GitHub here.

Service_Providers.csv contains Telecom service provider specialising in one of


the Telco product such as Fiber, DTH, 4G LTE etc.

Uses.csv maps the Service Provider in the above file to Major Telecom
players(known as Local Partners) in different Geographies.
Similar.csv has data on which major Telecom players are similar to each other.

4. Formulate problem statement in terms of data above

With the help of Neo4j, Data sources described above, the tooth fairy, and
black magic, can we recommend service providers and products to the Major
Telecom players in this B2B setting?

5. Let’s play

In step 2, you had opened the Neo4j browser. It looks something like this.

We can type commands next to neo4j$ prompt.

Just like there is SQL in the universe, neo4j has its own language CQL called
Cypher Query Language. I won’t call it much of a pain but I touched only a
small portion of it, so what do I know?

With the three CSV files in place, I ran the following to create the nodes and
relationships.
LOAD CSV WITH HEADERS FROM "file:///service_providers.csv" AS row
MERGE (pName:provider_name {name: row.Provider})
MERGE (pGeog:provider_loc {name: row.Geography})
MERGE (pServs:provider_serv {name: row.Services})
MERGE (pName)-[:Located_In]->(pGeog)
MERGE (pName)-[:provides]->(pServs)
LOAD CSV WITH HEADERS FROM "file:///uses.csv" AS row
MERGE (clientN:client_Name {name: row.Local_Partner})
MERGE (pName:provider_name {name: row.Provider})
MERGE (clientN)-[:Uses]->(pName)
LOAD CSV WITH HEADERS FROM "file:///similar.csv" AS row
MERGE (clientN:client_Name {name: row.Local_Partner})
MERGE (userN:client_Name {name: row.User})
MERGE (clientN)-[:Is_Similar]->(userN)

Image by Author
The sidebar of the database will have the information of all the nodes that
are created and all the relationships that exist between the node.

These nodes are queried upon and the relationships are used as filters in the
CQL.

#FunTimesBegin

Run the following command the neo4j prompt

Match(n) Return(n)
Image by Author

This is neat!

The visual representation tells me who is connected to whom with what


underlying relationship. Such visuals can be great for storytelling and the
business audience.
I suspect that this graph will look really messy when the number of nodes is
high.

6. Recommendations
This graph contains all the info of the data and we would use CQL to unearth
those relationships. We can find similar entities, what do they have in
common, what products do they use etc.

Let’s take the case of ‘Boston Locals’ which is one of the Major Telecom
Player(known as Local Partner).

#Other partners similar to ‘Boston Locals’

MATCH (boston:client_Name{name:"Boston Locals"})-[:Is_Similar]-


(client_Name)
RETURN client_Name.name

Image by Author

Two other Major Players are similar to Boston Locals.

#Find products and local providers that are used by similar major players.
MATCH (boston:client_Name {name:"Boston Locals"}),
(boston)-[:Is_Similar]-(partner),
(provider:provider_name)-[:Located_In]->(provider_loc),
(provider)-[:provides]->(provider_serv),
(partner)-[:Uses]->(provider)
RETURN provider.name, provider_loc.name, collect(partner.name),
provider_serv.name, count(*) as count
ORDER BY count DESC

In the above query, collect function will create a list of partners.

Image by Author

The above work in Neo4j works as what is called Collaborative Filtering in


the Recommendation Systems space. One finds the similarity between
items, users, user-items and uses the space to recommend items, products,
or services.

This isn’t sophisticated as embeddings, neural networks, matrix factorisation


but if the problem isn’t esoteric then why not go for a simpler solution!

7. Pythonic ways
It turns out that neo4j can interact with python via a driver.

pip install neo4j


Once that’s done you can easily call neo4j current DB session in python
file(make sure that DB is running otherwise you will get ServiceException
errors)

from neo4j import GraphDatabase


uri = "neo4j://localhost:7687"
user = "neo4j"
password = "hello@123"
driver = GraphDatabase.driver(uri, auth=(user, password))
session = driver.session()

Then you can define a function that uses the above session to run queries.
The file is available on my Github here.
One can look at the recommendations through a simple print statement.

Image by Author
It’s morning already!
After an initial litmus test and a tiring night, I was pleasantly surprised with
the results and the capabilities of Neo4j.

For the questions that I intended to find answers to:

1. Can I get meaningful information in a short amount of time?

Definitely! The visual information is advantage in understanding the deeper


relationships in the data. It also helps in the vernacular that is easily explainable
and comprehensible with the data.

2. How scalabale this entire thing is on my moderately powerful machine?

I ran it on my machine with 16 G RAM, 512 G HD, i7 6 Core; I tried running a


file with 200K rows and 5 columns (all numeric data) and I got Java heap space
error, decreased the file size but kept on getting the error till 70K rows. I can easily
use pandas dataframe or turicreate’s Sframe without batting an eyelid on those
files on my machine. So, at the moment I am skeptical of scalability.

3. How flexible it is in comparison to the pythonic approach? Data


manipulation, feature generation etc.

Here I used a classic use case which can be solved with basic manipulations but in
an indusstrial setting with increasing complexity, merely similarity doesn’t yield
good results. One needs to concoct feature spaces such as embeddings which is
possible in Neo4j but I haven’t explored that. Neo4j Graph Data Science shows
promise.
At this moment, I would like to include Neo4j in my Data Science life cycle
during the exploratory data analysis phase to form the hypotheses that I can
test using the usual pythonic ways.

With the help of CQL, I can find all the records that exhibit certain
characteristics and I can test the consistency of the results obtained from the
classical methods.

Epilogue: It was a productive night, time to sleep!

Photo by Jonathan Fink on Unsplash

Data Science Programming Machine Learning Python Neo4j


Written by Prashant Mudgal Follow

270 Followers · Writer for Towards Data Science

LinkedIn — shorturl.at/sI289 ; Other blog — https://bit.ly/3AVJ1rE ; Interested in science,


maths, startup, and films. Management consultant and data scientist

More from Prashant Mudgal and Towards Data Science


Prashant Mudgal in Towards AI Cristian Leo in Towards Data Science

Who is Responsible for Climate The Math Behind Neural Networks


Change? — A Graphical Approach Dive into Neural Networks, the backbone of
A Data-driven approach to the global modern AI, understand its mathematics,…
warming issue

· 14 min read · Oct 28, 2023 28 min read · Mar 28, 2024

470 7 1.7K 13

Alex Honchar in Towards Data Science Prashant Mudgal in ILLUMINATION

Intro to LLM Agents with An analytical way to choose your


Langchain: When RAG is Not… next earphones
First-order principles of brain structure for AI The science behind the earphones and the
assistants parameters involved

7 min read · Mar 15, 2024 · 16 min read · May 19, 2023

1.7K 8 4 1
See all from Prashant Mudgal See all from Towards Data Science

Recommended from Medium

Plaban Nayak in AI Planet Tomaz Bratanic

Implement RAG with Knowledge Constructing knowledge graphs


Graph and Llama-Index from text using OpenAI functions
Hallucination is a common problem when Seamlessy implement information extraction
working with large language models (LLMs).… pipeline with LangChain and Neo4j

25 min read · Dec 3, 2023 11 min read · Oct 20, 2023

1K 9 1.4K 10

Lists

Predictive Modeling w/ Practical Guides to Machine


Python Learning
20 stories · 1066 saves 10 stories · 1276 saves
Coding & Development General Coding Knowledge
11 stories · 543 saves 20 stories · 1084 saves

Wenqi Glantz in Better Programming Builescu Daniel in Python in Plain English

7 Query Strategies for Navigating My Boss Laughed at Python…Then I


Knowledge Graphs With… Showed Him This
Exploring NebulaGraph RAG Pipeline with the And Streamlined My Data Analysis Workflow
Philadelphia Phillies

· 17 min read · Sep 29, 2023 · 5 min read · Mar 28, 2024

935 4 1.3K 13

Kasper Junge Anthony Alca… in Artificial Intelligence in Plain En…

How to Use Neo4J with Python Enriching Language Models with


Introduction Knowledge Graphs for Powerful…
Retrieval-augmented generation (RAG) has
emerged as a vital technique to enhance lar…

2 min read · Dec 13, 2023 · 7 min read · Feb 19, 2024
57 801 2

See more recommendations

You might also like