Learning Graph DB in One Night - Neo4j - by Prashant Mudgal - Towards Data Science
Learning Graph DB in One Night - Neo4j - by Prashant Mudgal - Towards Data Science
71
Open in app
Open in app
Cafe Jax on 84th St. but eventually, it met the fate of most hobby projects —
swept underSearch
the carpet in an oblivious dimension. Write
I started at around 7:30 pm, shortly after dinner and finished at around 5:20
am. I relied heavily on the docs and examples on the documentation
website.
After installation, you will be welcomed with open arms in the Neo4j
community(at least that’s what the prompts on the screens say) and if you
don’t have time for faffing around, you will diligently close all such
paraphernalia and get to business straight away.
2. Set up a Graph DB
I am sure you can create a new Project and in that, you will have to create a
database. Click on Add Database
Image by Author
Image by Author
Give a username and password of your choice and click on Start. Once it is
running, click on Open.
<folder related to the DB you created above> — If it is your first project then
you will have only one folder under /dbmss, so place your .csv in the /import
there nonchalantly and audaciously.
(Only for mac users: The above folder is much easier to find on Windows or
Linux as in macOS the /Users/<your user folder>/Library is hidden, so you
can type /Users/<your user folder>/Library in spotlight search and get to the
folders)
You can get all the .csv files from the GitHub here.
Uses.csv maps the Service Provider in the above file to Major Telecom
players(known as Local Partners) in different Geographies.
Similar.csv has data on which major Telecom players are similar to each other.
With the help of Neo4j, Data sources described above, the tooth fairy, and
black magic, can we recommend service providers and products to the Major
Telecom players in this B2B setting?
5. Let’s play
In step 2, you had opened the Neo4j browser. It looks something like this.
Just like there is SQL in the universe, neo4j has its own language CQL called
Cypher Query Language. I won’t call it much of a pain but I touched only a
small portion of it, so what do I know?
With the three CSV files in place, I ran the following to create the nodes and
relationships.
LOAD CSV WITH HEADERS FROM "file:///service_providers.csv" AS row
MERGE (pName:provider_name {name: row.Provider})
MERGE (pGeog:provider_loc {name: row.Geography})
MERGE (pServs:provider_serv {name: row.Services})
MERGE (pName)-[:Located_In]->(pGeog)
MERGE (pName)-[:provides]->(pServs)
LOAD CSV WITH HEADERS FROM "file:///uses.csv" AS row
MERGE (clientN:client_Name {name: row.Local_Partner})
MERGE (pName:provider_name {name: row.Provider})
MERGE (clientN)-[:Uses]->(pName)
LOAD CSV WITH HEADERS FROM "file:///similar.csv" AS row
MERGE (clientN:client_Name {name: row.Local_Partner})
MERGE (userN:client_Name {name: row.User})
MERGE (clientN)-[:Is_Similar]->(userN)
Image by Author
The sidebar of the database will have the information of all the nodes that
are created and all the relationships that exist between the node.
These nodes are queried upon and the relationships are used as filters in the
CQL.
#FunTimesBegin
Match(n) Return(n)
Image by Author
This is neat!
6. Recommendations
This graph contains all the info of the data and we would use CQL to unearth
those relationships. We can find similar entities, what do they have in
common, what products do they use etc.
Let’s take the case of ‘Boston Locals’ which is one of the Major Telecom
Player(known as Local Partner).
Image by Author
#Find products and local providers that are used by similar major players.
MATCH (boston:client_Name {name:"Boston Locals"}),
(boston)-[:Is_Similar]-(partner),
(provider:provider_name)-[:Located_In]->(provider_loc),
(provider)-[:provides]->(provider_serv),
(partner)-[:Uses]->(provider)
RETURN provider.name, provider_loc.name, collect(partner.name),
provider_serv.name, count(*) as count
ORDER BY count DESC
Image by Author
7. Pythonic ways
It turns out that neo4j can interact with python via a driver.
Then you can define a function that uses the above session to run queries.
The file is available on my Github here.
One can look at the recommendations through a simple print statement.
Image by Author
It’s morning already!
After an initial litmus test and a tiring night, I was pleasantly surprised with
the results and the capabilities of Neo4j.
Here I used a classic use case which can be solved with basic manipulations but in
an indusstrial setting with increasing complexity, merely similarity doesn’t yield
good results. One needs to concoct feature spaces such as embeddings which is
possible in Neo4j but I haven’t explored that. Neo4j Graph Data Science shows
promise.
At this moment, I would like to include Neo4j in my Data Science life cycle
during the exploratory data analysis phase to form the hypotheses that I can
test using the usual pythonic ways.
With the help of CQL, I can find all the records that exhibit certain
characteristics and I can test the consistency of the results obtained from the
classical methods.
· 14 min read · Oct 28, 2023 28 min read · Mar 28, 2024
470 7 1.7K 13
7 min read · Mar 15, 2024 · 16 min read · May 19, 2023
1.7K 8 4 1
See all from Prashant Mudgal See all from Towards Data Science
1K 9 1.4K 10
Lists
· 17 min read · Sep 29, 2023 · 5 min read · Mar 28, 2024
935 4 1.3K 13
2 min read · Dec 13, 2023 · 7 min read · Feb 19, 2024
57 801 2