0% found this document useful (0 votes)
19 views

Knowledge Graph

This document provides a comprehensive guide on building and visualizing a knowledge graph using Neo4j and Python. It outlines the necessary steps, including setting up the environment, installing required libraries, connecting to the Neo4j database, and creating a knowledge graph from text. The guide is aimed at data scientists and developers, offering practical examples and code snippets for implementation.

Uploaded by

en21it301055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Knowledge Graph

This document provides a comprehensive guide on building and visualizing a knowledge graph using Neo4j and Python. It outlines the necessary steps, including setting up the environment, installing required libraries, connecting to the Neo4j database, and creating a knowledge graph from text. The guide is aimed at data scientists and developers, offering practical examples and code snippets for implementation.

Uploaded by

en21it301055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Steps to Build and Visualise a

Knowledge Graph with Neo4j and


Python
By Jayesh Gulani

Introduction
The ability to transform unstructured text into a structured knowledge graph is a
game-changing advancement in data processing and information retrieval.
Knowledge graphs represent relationships and entities in a way that is both
human-readable and machine-interpretable, enabling a range of applications such as
semantic search, recommendation systems, and data-driven insights.

This blog outlines the step-by-step process to build and visualize a knowledge graph
using Neo4j and Python. With Neo4j as the graph database and Python libraries like
LlamaIndex for text processing, we will extract meaningful entities and relationships
from raw text and visualize them in Neo4j. Whether you are a data scientist, software
developer, or enthusiast, this guide will help you get started with knowledge graph
construction.

Below is the flow diagram that illustrates the entire process:

Prerequisite: Set up a .env File


Before starting, create a .env file in the same directory as your Python script. Add the
following details to configure your OpenAI and Neo4j credentials:

OPENAI_API_KEY=your_openai_api_key
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_neo4j_password
NEO4J_DATABASE=neo4j

Step 1: Download and Install Neo4j

●​ Download and install Neo4j Desktop or Neo4j Server from the official Neo4j website.
●​ Follow the instructions for your platform to install Neo4j Desktop or Neo4j Server.


Steps to set up Neo4j and ensure proper connection:
1.​ Download and install Neo4j Desktop
2.​ Install APOC plugin: Go to the "Manage" screen, then the "Plugins" tab. Click
"Install" in the APOC box.
3.​ Configure Neo4j: Open the “neo4j.conf” file
Uncomment or add these lines:

server.bolt.enabled=true
server.bolt.listen_address=:7687

​ ​

Create a Database in Neo4j Desktop

1.​ Open Neo4j Desktop.


2.​ Click on the "New Project" button to create a new project, or select an existing
project.
3.​ In the project view, click on "Add Database".
4.​ Choose "Local DBMS" and enter a name for the database.
5.​ Set a password if required, then click "Create" to initialise the database.

Step 2: Install the Required Libraries


●​ Install python-dotenv (for managing environment variables):

pip install python-dotenv


●​ Install llama-index (the LlamaIndex library):​

pip install llama-index

●​ Install neo4j (Neo4j Python driver):

pip install neo4j

Step 3: Set up the environment and import necessary libraries


Import required modules, load environment variables, and configure settings for OpenAI and
Neo4j connections.

import os
from dotenv import load_dotenv
from llama_index.core import Document, Settings, StorageContext
from llama_index.core import KnowledgeGraphIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.vector_stores.simple import SimpleVectorStore
from llama_index.graph_stores.neo4j import Neo4jGraphStore
from neo4j import GraphDatabase

load_dotenv()

openai_api_key = os.getenv("OPENAI_API_KEY")
neo4j_uri = os.getenv("NEO4J_URI")
neo4j_user = os.getenv("NEO4J_USER")
neo4j_password = os.getenv("NEO4J_PASSWORD")

Settings.llm = OpenAI(api_key=openai_api_key, temperature=0.1,


model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(api_key=openai_api_key)

Step 4: Connect to Neo4j database

Establish a connection to the Neo4j database using the provided credentials.

def connect_to_neo4j():

return GraphDatabase.driver(neo4j_uri, auth=(neo4j_user,


neo4j_password))
driver = connect_to_neo4j()

Step 5: Clear the existing database

Define a function to clear all nodes and relationships in the Neo4j database.

def clear_database():

with driver.session() as session:

session.run("MATCH (n) DETACH DELETE n")

print("Database cleared.")

Step 6: Create and store the knowledge graph

Define a function to create a knowledge graph from the input text using LlamaIndex and
store it in Neo4j.

def create_and_store(text):

documents = [Document(text=text)]

graph_store = Neo4jGraphStore(username=neo4j_user,
password=neo4j_password, url=neo4j_uri, database="neo4j")

vector_store = SimpleVectorStore()

storage_context =
StorageContext.from_defaults(graph_store=graph_store,
vector_store=vector_store)

index = KnowledgeGraphIndex.from_documents(documents,
storage_context=storage_context, max_triplets_per_chunk=10)

index.storage_context.graph_store.persist(persist_path="D:/knowledge_gr
aph")
query_engine = index.as_query_engine(include_text=False,
response_mode="tree_summarize")

response = query_engine.query("Summarize the key points of the


text")

print("Query Response:", response)

with driver.session() as session:

result = session.run("MATCH (n) RETURN count(n) as node_count")

node_count = result.single()["node_count"]

print(f"Number of nodes in the database: {node_count}")

Step 7: Visualise the knowledge graph

Define a function to retrieve nodes and relationships from the Neo4j database for
visualisation.

def visualise():

with driver.session() as session:

node_result = session.run("MATCH (n) RETURN n")

nodes = [record['n'] for record in node_result]

rel_result = session.run("MATCH ()-[r]->() RETURN r")

relationships = [record['r'] for record in rel_result]

return nodes, relationships

Step 8: Create the knowledge graph from text


Define a function that combines the previous steps to create a knowledge graph from the
input text.

def create_knowledge_graph(text):

clear_database()

create_and_store(text)

nodes, relationships = visualise()

print(f"Created {len(nodes)} nodes and {len(relationships)}


relationships")

Step 9: Execute the knowledge graph creation

Provide an example text and run the knowledge graph creation process.

text = """

Marie Curie was a pioneering physicist and chemist who conducted


groundbreaking research on radioactivity. Born in Warsaw in 1867, she
moved to Paris to further her studies. Curie discovered two new
elements, polonium and radium, and developed techniques for isolating
radioactive isotopes. In 1903, she became the first woman to win a
Nobel Prize and remains the only person to win Nobel Prizes in two
scientific fields. Her work laid the foundation for many modern
applications in nuclear physics and cancer treatment. Despite facing
gender discrimination in the scientific community, Curie's dedication
to science never wavered. Her legacy continues to inspire generations
of scientists, particularly women in STEM fields.

"""

print("Creating knowledge graph:")

create_knowledge_graph(text)

driver.close()

Step 10: Run the Python File in the Terminal

1.​ Open your terminal or command prompt.


2.​ Navigate to the directory containing your Python file.

Run the file using the following command:

python your_script_name.py

Step 11: View the Knowledge Graph in the Neo4j Browser

1.​ Open the Neo4j Browser


2.​ Run the following query to view the nodes and relationships in the graph

MATCH (n)-[r]->(m) RETURN n, r, m

Explore the graph visualisation to see the relationships between entities.


Examples:

Text:

Albert Einstein was a theoretical physicist who developed the theory of relativity.

Isaac Newton is famous for his laws of motion and gravity.

Einstein and Newton revolutionised the field of physics.

Text: The Industrial Revolution, which began in the late 18th century, marked a major
turning point in history. This period saw a shift from manual labour and animal-based
production to machine-based manufacturing. It started in Great Britain and quickly
spread to other parts of Europe and North America. The revolution brought about
significant technological, socioeconomic, and cultural changes.

Key innovations during this time included the steam engine, developed by James
Watt, which became a primary source of power for the new factories. The textile
industry saw remarkable advancements with inventions like the spinning jenny by
James Hargreaves and the power loom by Edmund Cartwright. These innovations
dramatically increased production capacity and efficiency.

The transportation sector also underwent massive changes. The development of


steam-powered boats by Robert Fulton and the creation of the locomotive by George
Stephenson revolutionised how goods and people were moved. This led to the rapid
expansion of railways, connecting cities and facilitating trade.

The Industrial Revolution had profound social impacts. It led to urbanisation as


people moved from rural areas to cities in search of factory jobs. This shift created
new social classes, including a growing middle class and a large working class.
Working conditions in early factories were often harsh, leading to the rise of labour
movements and eventual reforms.

The period also saw significant scientific advancements. The work of scientists like
Michael Faraday in electromagnetics laid the groundwork for future technological
developments. Charles Darwin's theory of evolution by natural selection, published in
'On the Origin of Species,' revolutionised the field of biology and our understanding of
life on Earth.

While the Industrial Revolution brought about unprecedented economic growth and
technological progress, it also had negative consequences. Environmental pollution
increased dramatically, and social inequalities became more pronounced. These
issues continue to shape discussions about sustainable development and social
justice in the modern world.

The legacy of the Industrial Revolution continues to influence our lives today, from
the way we work and travel to how we understand the relationship between
technology, society, and the environment.

You might also like