0% found this document useful (0 votes)

68 views14 pages

Performance Comparison of Graph Database and Relational Database

Uploaded by

jinana1077

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views14 pages

Performance Comparison of Graph Database and Relational Database

Uploaded by

jinana1077

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/370751317

Performance Comparison of Graph Database and Relational Database

Technical Report · May 2023

DOI: 10.13140/RG.2.2.27380.32641

CITATIONS READS

0 2,395

3 authors, including:

Cajetan Rodrigues Mit Ramesh Jain

4 PUBLICATIONS 0 CITATIONS
San Jose State University
1 PUBLICATION 0 CITATIONS
SEE PROFILE
SEE PROFILE

All content following this page was uploaded by Cajetan Rodrigues on 13 May 2023.

The user has requested enhancement of the downloaded file.

Performance Comparison of Graph Database
and Relational Database

Mit Jain Ashish Khanchandani Cajetan Rodrigues

Computer Science Department Computer Science Department Computer Science Department
San Jose State University San Jose State University San Jose State University
San Jose, USA San Jose, USA San Jose, USA
[email protected] [email protected] [email protected]

Abstract—We aim to present a comprehensive Graph databases are particularly useful for
comparison between a graph database, Neo4j, and a applications that deal with complex and interconnected
relational database, MySQL, focusing on their data, such as social networks, recommendation engines,
performance based on different types of queries. Graph and fraud detection systems. They provide a more
databases utilize graph structures, nodes, edges, and natural and intuitive way to represent data than relational
properties to represent data, while relational databases databases, especially when dealing with unstructured or
employ tables and relationships between them. This semi-structured data. Graph databases can also handle
study aims to evaluate the performance of Neo4j and large amounts of data and scale horizontally, making
MySQL in terms of data query execution time by them suitable for applications with a high volume of
data.
examining representative queries from four categories:
selection/search, recursion, aggregation, and pattern One of the main reasons for the popularity of graph
matching. Real-world data from Career Village was databases is their ability to perform complex queries
used for the experiment. The results show that Neo4j quickly and efficiently. Graph databases use a traversal-
outperforms MySQL in most cases, particularly in based query language known as Cypher that allows users
pattern matching and recursive queries. However, to search for patterns and relationships within the data.
MySQL has advantages in terms of data consistency This makes it easy to perform tasks such as pathfinding,
and transactional support. recommendation generation, and fraud detection.
Relational databases are one of the most widely used
types of databases, popular for their ability to store and
Keywords—Databases, Neo4j, NoSQL, Graph
manage large amounts of data in an organized and
Databases, Relational Databases efficient manner. They represent data in a tabular form,
with each table consisting of rows and columns, where
each row represents a record and each column represents
I. INTRODUCTION a specific attribute of that record. Relational databases
are based on the principles of relational algebra and are
designed to enforce data integrity and consistency.
Graph databases revolutionized the way data is
stored and processed. By representing data as nodes and One of the main reasons for the popularity of
edges in a graph, they enable us to uncover insights that relational databases is their ability to handle complex
would be impossible to detect or require complex and data relationships. By organizing data into tables and
expensive join operations with traditional relational establishing relationships between them, relational
databases. They allow us to efficiently navigate through databases make it easier to perform complex queries and
vast and intricate networks of data, making them analysis. They also provide a standardized language for
invaluable tools for applications ranging from e- querying and manipulating data,
commerce to scientific research. The research is
We aim to determine the difference between
motivated by the comparison of MySQL and Graph
databases and suggesting which database is suited under traditional RDBMS and a graph-based NoSQL
which scenarios. database. We execute a comprehensive comparison by
using a dataset and querying the same data in both
schemas across different categories. To facilitate a

1
comparison between a graph-based NoSQL database relational databases excel at managing structured data
and a traditional relational database management and enforcing integrity constraints.
system (RDBMS), we will employ Neo4j as the [3] offers a rather broad and detailed view of various
representative for the graph-based NoSQL database graph database models like property graphs, RDF
category and MySQL as the exemplar for the traditional graphs, Hypergraphs among others. They also provide
RDBMS category. We compare the performance of strength and weaknesses of different approaches for
various operations like search, pattern matching, managing graph data.
recursion, and aggregation. Neo4j is touted to be one of While [4] concludes that while relational databases
the best graph bases systems in the industry; well excel at managing structured data and enforcing data
known for its execution speed and the benefits that integrity constraints, graph databases are more effective
come with having a graph structure with nodes and at handling unstructured and semi-structured data with
edges to model the data effectively. MySQL is a very complex relationships.
popular and widely used RDBMS. The purpose is to In [5], the authors evaluate the performance and
show which is better and how significant of a difference scalability of both database models using various
it makes if either database is chosen. metrics, including response time, throughput, and CPU
usage. The authors found that the non-relational
This paper is structured as follows: In Section 2, we database model performed better in terms of response
conduct a survey of previous studies that are relevant to time and scalability, while the relational database model
performance comparison between MySQL and Neo4j. performed better in terms of data consistency and
Section 3 outlines the dataset used to assess the availability. To analyze further about Graph Databases
performance of these two database systems, specifically and how to query them, [6] offers a comprehensive
comparing graph databases (Neo4j) with relational overview of query languages for graph databases,
databases (MySQL). Section 4 details the Neo4j test providing readers with a solid foundation for
environment. Section 5 represents SQL test understanding how to query and manipulate graph data
environment. Section 6 represents the implementation in different contexts. The authors describe the features
and comparison between the SQL & Neo4j Queries.
of modern graph query languages, such as Cypher,
Section 7 outlines performances strategies used and
comparative analysis. Section 8 showcases performance Gremlin, and SPARQL, and provides examples of how
results. Finally, Section 9 concludes the paper and to use these languages to perform different types of
provides a discussion of the findings. queries thereby providing a solid foundation for
understanding how to query and manipulate graph data
in different contexts. [7-10] take a deeper look into
performance of graph databases on different datasets
II. RELATED WORK and focus on their performance on aggregation and
recursive queries.
Various studies have compared the performance of
MySQL and Neo4j graph databases for different types
of queries and datasets. Some studies have found that III. DATASET
Neo4j performs better than MySQL in terms of query
speed, while others have found that MySQL is faster
and more memory efficient. The types of queries tested A. Collection of Dataset
include selection, aggregation, recursion, pattern
matching. The studies also explore the use of graph The CareerVillage dataset provides a valuable
databases in various domains, such as social network resource for researchers interested in studying career
analysis, web-based applications, IoT data guidance and counseling. In this research paper, we will
management, and Customer Relationship Management use the dataset to compare the performance of SQL vs
(CRM) systems. Overall, the studies suggest that the Neo4j, two popular database management systems.
performance of graph databases is better than that of Specifically, we will analyze how these systems perform
conventional databases for certain types of queries and when querying and processing the dataset's information,
datasets. which includes questions asked by students, answers
In [1] and [2], the authors draw comparisons provided by professionals, and demographic data of both
between a graph based and relational based database students and professionals. Our evaluation criteria will
and highlight the advantages and disadvantages of both focus on four query groups: selection, recursion,
databases. In [2], authors highlight the use of graph for aggregation, and pattern matching. These query groups
represent common types of queries that are used to
tracking the relationships and origins from the
analyze large, complex datasets. By evaluating the
perspective of data provenance and talk about how performance of SQL and Neo4j on these query groups,
we hope to gain insights into the strengths and

2
limitations of each database management system. IV. NEO4J TEST ENVIRONMENT
Ultimately, our research aims to provide guidance to
researchers and practitioners in selecting the most
appropriate database management system for analyzing
similar datasets.

B. Dataset representation

The dataset is a collection of csv files provided

by careervillage.org. careervillage.org is a website
which is a community of students and professionals
where students post questions and professionals offer
advice in the form of answers to posted questions. The
csv files collectively are a collection of tables. These
tables contain a subset of data stored by the actual
database of CareerVillage. The total size of the dataset
is 436.59 MB and it has 15 files.
Understanding what each file represents is
crucial to making sense of the data. The answers.csv file
contains the answers that are posted by registered
professionals in response to students’ questions.
Answers can only be posted by professionals. The
comments.csv file contains comments made on answers
or questions. Comments can be posted by anyone. The
emails.csv file contains information
marketing/subscription emails sent. The Fig. 1. UML Sequence diagram for Neo4j implementation
‘frequency_level’ of an email is a label which has an
implicit frequency indicating the number of times such
emails are sent. The group_memberships.csv file tracks A. Loading dataset onto Neo4j
user group memberships, with any user being allowed
to join any group. Before we begin loading the dataset, here are some
On the other hand, the groups.csv file contains prerequisites:
information about each group, but the group names • Installed instance of Neo4j Desktop
have been left off for privacy reasons. The matches.csv • Installation of python (Python version 3.9.13
file links questions included in emails, with each row was used in our implementation)
containing information on the email's ID. The • Python library: neo4j
professionals.csv file contains information about the
site's volunteers, who are referred to as professionals. We analyzed the dataset and deduced that few files
The questions.csv file contains the questions posted by contained entity information and few files had
students. The school_memberships.csv file tracks user relationships between the entities. We created nodes
memberships in schools, with a similar structure to corresponding to each entity. We also created nodes to
group_memberships.csv. Only students are allowed to represent the relationships between two nodes. In the
be part of school groups. premature stages of loading the dataset into the Neo4j
Lastly, the students.csv file contains framework, we manually wrote queries for loading each
information about the site's students, who are the reason node into the Neo4j framework. We also manually wrote
CareerVillage.org exists. The tag_questions.csv file queries to create relationships between the nodes. Since
tracks hashtag-to-question pairings, while the we were testing it as a team, we soon realized that we
tag_users.csv file shows which hashtags each user needed something dynamic. Hence, we went ahead and
follows. Finally, the tags.csv file contains the name of employed a automated way of loading the dataset and
each tag, and the question_scores.csv and creating nodes and relationships on a single script run.
answer_scores.csv files contain the number of "hearts" We created 2 python scripts. Firstly, for reading the
information from the csv files and creating nodes in the
received by each question and answer, respectively.
database. Secondly, for creating relationships between
the nodes. We automated the execution of both scripts
by writing a bash script to creates the nodes first. The
following is a step-by-step execution of the script.

3
1. Begin by invoking the bash shell 1. Count all nodes.
initialize_neo4j.sh.
2. In the bash shell, execute the script called The following query counts all the nodes loaded
execute.sh. on the Neo4j DBMS.
3. The execute.sh script executes the first Python
script called load_nodes.py, which reads the data
from the CSV file and creates the corresponding
nodes in the Neo4j database.
4. Once the load_nodes.py script completes its
execution, the setup.sh script executes the second
Python script called create_relationships.py. Fig. 2. Query to count all nodes
5. The create_relationships.py script reads the data
from the CSV file and creates the relationships
between the nodes created in the previous step. 2. Count all relationships.
6. End the script.
The following query counts all the possible
Thus, the bash script executes both Python scripts relationships between a pair of nodes.
sequentially, where the first script loads the nodes into
the Neo4j database, and the second script creates
relationships between the nodes. This approach allowed
us to automate the entire process of loading data into the
Neo4j database and creating relationships between the
nodes using a single command.

Fig. 3. Query to count all relationships

B. Visualising the Neo4j Schema

Post loading the dataset onto the neo4j database, we 3. List count of each node
used the Cypher language to write data profiling
queries to visualize the created nodes and The following query counts the number of nodes
relationships. present in each entity or label.
Cypher is the query language used in Neo4j, a
popular graph database management system. It is a
declarative, pattern-matching language that is
specifically designed for querying and manipulating
graph data. With Cypher, users can express complex
Fig. 4. Query to list count of each entity
queries in a concise and readable syntax that is easy to
understand and maintain.
Cypher provides a range of expressive syntax for TABLE I
filtering, aggregating, and transforming data stored in DISPLAYING THE COUNT OF EACH ENTITY
a graph. It also supports several advanced features
such as pattern matching, traversals, path finding, and Node Count
spatial operations. Cypher queries are constructed Matches 4316275
using ASCII art-like patterns, which makes them easy Emails 1850101
to read and understand.
Tag Users 136663
Overall, Cypher is a powerful and flexible language
Tag Questions 76553
that enables users to query and manipulate graph data
Answers 51123
in Neo4j quickly and easily. It is a key component of
the Neo4j ecosystem, and is widely used by Students 30971
developers, data analysts, and data scientists to build Professionals 28152
graph-based applications and solve complex data Questions 23931
problems. Tags 16269
Comments 14966
The following are some of data profiling queries we School Memberships 5638
used to visualize the created nodes and relationships. Group Memberships 1038
Groups 49

4
• Installed instance of MySQL server (MySQL
server community edition 8.0.32 was used in
4. Visualize all nodes and relationships. our implementation)
• Installation of python (Python version 3.9.13
The following is the representation of the nodes and the was used in our implementation)
relationships between the nodes.
• Python library: mysql-connector-python

We load the dataset into tables in two phases.

Fig. 5. Query to visualise the schema
First, we create the database and the necessary tables.
In doing so, we define the schemas and implement all
the required constrains, indexes, and relationships.
Second, we read data from csv files, construct SQL
queries, and execute these queries. Phase 1 is
implemented by the files ‘createDB.py’ and
‘createTables.py’. Phase 2 is implemented by
‘loadData.py’. All these python scripts are stored in
the ‘load-data-into-sql’ subdirectory. To avoid the
hassle of executing these python scripts manually, a
shell script, called ‘initialize_sql.sh’, has been
provided in the same directory which automates the
complete data loading process and successfully
executes both the phases.

B. Visualizing the MySQL schema

Fig. 6. Neo4j database schema visualization

After the successful completion of dataset loading, we

will have the following tables in the database:

V. MYSQL TEST ENVIRONMENT TABLE II

DISPLAYING THE NUMBER OF TUPLES IN
EACH TABLE

Table No. of tuples

answer_scores 51107
answers 51123
comments 14966
emails 1850101
group_memberships 1038
groups_ 49
professionals 28152
question_scores 23928
questions 23931
school_memberships 5638
students 30971
tag_questions 76553
tag_users 136663
tags 16269
Fig. 7. UML Sequence for SQL implementation
Essentially each csv file present in the dataset is
loaded in a separate table. We visualize our database
using an Entity-Relationship (ER) diagram which
A. Loading dataset in MySQL
captures the relationships between various entities. Fig
8. below shows the ER diagram.
Before we begin loading the dataset, here are some
prerequisites:

5
Fig. 8. Entity-Relationship diagram

VI. IMPLEMENTATION (SQL & NEO4J)

For the sake of simplicity, we will visualize the results

for one query in each category. For MySQL we show
the execution of EXPLAIN ANALYZE command. For
Neo4j counterpart, we show the graph visualization.

Fig. 9. Flowchart showing overall implementation

1. Selection

6
SQL
SELECT * FROM professionals p JOIN
emails e ON p.professionals_id =
e.emails_recipient_id WHERE
p.professionals_id =
'0079e89bf1544926b98310e81315b9f1';
Cypher
MATCH
(p:Professionals{professionals_id:
'0079e89bf1544926b98310e81315b9f1'})-
[:GOT_EMAIL]->(e:Emails)
RETURN e

2. Recursion

Fig. 10. Query to find professionals in a specific tag Q4: Looking for the questions with answers
recursively many times?

SQL
WITH RECURSIVE answer_replies AS(
SELECT answers_id, answers_author_id,
answers_question_id,
answers_date_added, answers_body FROM
answers WHERE answers_question_id IS
not null UNION all SELECT
Fig. 11. EXPLAIN ANALYSE on SQL Query a.answers_id, a.answers_author_id,
a.answers_question_id,
a.answers_date_added, a.answers_body
Q2: Looking for students in a specific group and FROM answers a INNER JOIN
interested in a specific tag? answer_replies ar ON ar.answers_id =
a.answers_question_id ) SELECT * FROM
SQL answer_replies ar LEFT JOIN questions
SELECT * FROM students s JOIN q ON ar.answers_question_id =
group_memberships gm ON students_id = q.questions_id;
gm.group_memberships_user_id JOIN Cypher
groups_ g ON g.groups_id = MATCH (q:Questions)<-
gm.group_memberships_group_id JOIN [:IS_REPLY_TO*1..]-(a:Answers)
tag_users tu ON tu.tag_users_user_id RETURN q,aWHERE
= s.students_id JOIN tags t ON t.tags_tag_name='college'
t.tags_tag_id = tu.tag_users_tag_id RETURN p,t
WHERE t.tags_tag_name = 'college' AND
g.groups_group_type = 'youth
program';
Cypher
MATCH (t:Tags)<-[:HAS_TAG]-
(s:Students)-
[:MEMBER_IN]->(b)
WHERE t.tags_tag_name='college'
AND b.groups_group_type='youth
program'
RETURN s,t,b

Q3: Looking for all emails received by a particular

professional?
Fig. 12. Query to find questions with answers
recursively many times

7
LEFT JOIN questions q ON
ar.answers_question_id =
q.questions_id;
Cypher
MATCH (q:Questions)<-
[:IS_REPLY_TO*1..3]-
(a:Answers)
RETURN q,a

3. Aggregation
Fig. 13. EXPLAIN ANALYSE SQL Command
Q7: Count the number of professionals who answered
the questions.
Q5: Looking for questions with answers recursively
twice?
SQL
SELECT count(professionals_id) FROM
SQL professionals p JOIN answers a ON
WITH RECURSIVE answer_replies p.professionals_id =
AS(SELECT 1 as level,answers_id, a.answers_author_id;
answers_author_id, Cypher
answers_question_id, MATCH (p:Professionals)-[]-
answers_date_added, answers_body FROM >(a:Answers)
answers WHERE answers_question_id IS RETURN count(p)
not null UNION all SELECT level+1,
a.answers_id, a.answers_author_id,
a.answers_question_id,
a.answers_date_added, a.answers_body
FROM answers a INNER JOIN
answer_replies ar ON ar.answers_id =
a.answers_question_id WHERE level
<=2) SELECT * FROM answer_replies ar
LEFT JOIN questions q ON
ar.answers_question_id = Fig. 14. Cypher query to count the number of
q.questions_id; professionals who answered the question
Cypher
MATCH (q:Questions)<-
[:IS_REPLY_TO*1..2]-
(a:Answers)

Q6: Looking for questions with answers recursively 3

times?

Fig. 15. EXPLAIN ANALYSE SQL Command

SQL
WITH RECURSIVE answer_replies AS(
SELECT 1 as level,answers_id, Q8: Count the number of professionals of a specific
answers_author_id, tag.
answers_question_id,
answers_date_added, answers_body FROM
answers WHERE answers_question_id IS
not null UNION all SELECT level+1, SQL
a.answers_id, a.answers_author_id, SELECT count(*) FROM (SELECT DISTINCT
a.answers_question_id, p.* FROM professionals p JOIN
a.answers_date_added, a.answers_body tag_users tu ON p.professionals_id =
FROM answers a INNER JOIN tu.tag_users_user_id JOIN tags t ON
answer_replies ar ON ar.answers_id = tu.tag_users_tag_id = t.tags_tag_id
a.answers_question_id WHERE level WHERE t.tags_tag_name = 'college') AS
<=3) SELECT * FROM answer_replies ar temp;

8
Cypher
MATCH (p:Professionals)-[:HAS_TAG]-
>(t:Tags)
WHERE t.tags_tag_name='college'
RETURN count(p)

4. Pattern Match
Q11: Looking for students and professionals with the
same group?
Q10: Looking for the question answered in tags?
SQL
SELECT g.groups_id, professionals_id,
SQL
students_id FROM groups_ g JOIN
SELECT q.questions_id, t.tags_tag_id,
group_memberships gm ON g.groups_id =
a.answers_id FROM tags t JOIN
gm.group_memberships_group_id JOIN
tag_questions tq ON t.tags_tag_id =
(SELECT group_memberships_group_id AS
tq.tag_questions_tag_id JOIN
group_id, professionals_id FROM
questions q ON professionals p JOIN
tq.tag_questions_question_id = group_memberships gm1 ON
q.questions_id JOIN answers a ON gm1.group_memberships_user_id =
a.answers_question_id = questions_id;
p.professionals_id) pg ON pg.group_id
Cypher = gm.group_memberships_group_id JOIN
MATCH (a:Answers)-[]->(q:Questions)- (SELECT group_memberships_group_id AS
[]->(t:Tags) group_id, students_id FROM students s
RETURN a,q,t JOIN group_memberships gm2 ON
s.students_id =
gm2.group_memberships_user_id) sg ON
sg.group_id=
gm.group_memberships_group_id;
Cypher
MATCH (p:Professionals)-[]-
>(g:Groups)<-[]-(s:Students)
RETURN p, g, s

9
Q12: Looking for patterns that students and experts
in the same tag?

SQL
SELECT pt.tags_id, st.students_id,
pt.professionals_id FROM tags t JOIN
tag_users tu ON t.tags_tag_id =
tu.tag_users_tag_id JOIN (SELECT
u.tag_users_tag_id AS tags_id,
professionals_id FROM professionals p
JOIN tag_users u ON
p.professionals_id =
u.tag_users_user_id) pt ON
pt.tags_id= t.tags_tag_id JOIN
(SELECT u.tag_users_tag_id AS Fig. 19. Cypher query using Explain command in Neo4j.
tags_id, students_id FROM students s
JOIN tag_users u ON s.students_id = The ‘profile’ command provides more detailed
u.tag_users_user_id) st ON st.tags_id information than the explain command. It provides
= t.tags_tag_id LIMIT 100000; information on the execution plan, as well as
Cypher additional statistics on how the query was executed,
MATCH (p:Professionals)-[]- such as the number of database hits, the number of
>(t:Tags)<-[]-(s:Students) rows processed at each stage, and the total processing
RETURN p, t, s LIMIT 100000
time. The command used is :

VII. PERFORMANCE EVALUATIONS

A. Neo4j Performance Strategy Fig. 20. Cypher query using PROFILE command in
Neo4j.
Now that we have out data modelled and setup, we
use the Neo4j Browser client in the Neo4j Desktop
app to run our Cyphers as discussed in the paper in the The profile command is more useful than the explain
previous section. Neo4j provides a lot of functionality command when optimizing queries because it provides
out of the box as we can use the EXPLAIN and more detailed information about the performance of the
PROFILE keywords. query. By examining the statistics provided by the
PROFILE command, developers can identify
The ‘explain’ command is used to show the performance bottlenecks and adjust optimize their
execution plan of a Cypher query. It provides queries. Figures 21 and 22 show the more detailed
information on how the query will be executed, such execution plan and at the bottom of figure 22 we can also
as which indexes will be used, which operations will see the execution time displayed for the query to
be performed, and the estimated number of rows that complete execution. We will be using the same for all
will be processed. The command used is : the 12 queries and run each query 3 times and take the
average of their runtimes to consider that value for
further comparison against relation execution times.
Fig. 18. Cypher query using Explain command in Neo4j.

10
query results on the client-side, which is important to
calculate the exact execution time purely reflective of
the MySQL DB engine capability. For demonstrative
purposes, consider the same query, that is used in the
previous subsection, i.e., query Q4. Fig. 23 shows the
output of executing the EXPLAIN ANALYZE
command on Q4.

Fig. 23. EXPLAIN ANALYZE command shown in

Query Q4
Fig. 24 below shows a zoomed-in version of Fig 23.
allowing us to see the granular query execution details
captured by the EXPLAIN ANALYZE command.

Fig. 21. Visualising the PROFILE command Fig. 24. Zoomed-in version of Fig. 23
We will run the EXPLAIN ANALYZE command on
each of the 12 queries three times to find the average
execution time of each query so that it can be used for
performance comparison with Neo4j in the next section.

VIII. PERFORMANCE RESULTS

Our evaluation criteria would majorly look at

Fig. 22. Profiling command shown on Query the following query groups:

• Selection/Search
B. MySQL Performance Startegy • Recursive/Related
• Aggregation
Once the data loading process is complete, we
• Pattern Matching
use the ‘mysql’ command-line client tool to query our
database and test if the data is loaded so that we can go
ahead with performance evaluation. For analyzing the For the scope of this comparison, we focus on one
execution of a query and finding out the exact of the most practical parameters to judge performance
execution time we use the EXPLAIN ANALYZE of a query: execution time. We compare the execution
command. times across the four categories, and we ensure to have
This command is essential as it gives the at least 3 queries per category.
complete breakdown of how the query was executed The hardware configuration used was Apple
(types of join strategy used, result size – intermediate MacBook Pro with M1 Pro Apple Silicon Chip coupled
and final, estimated cost, actual execution times at all with 16 GB RAM and running the latest version of
steps, etc.) and exactly how much time was spent on MacOS Ventura. We have three machines of the same
each aspect of the query. This can help identify configuration with each running a local instance of both
bottlenecks and optimize performances where needed. databases i.e. Neo4j Desktop and MySQL so we can
Additionally, it ignores the time required to render later record and take average and ensure there was no

11
swaying of results due to any other external factors include metrics such as how much memory tradeoff is
related to our local systems. there is storing duplicated data in a NoSQL system and
We then recorded the time it took for both databases it’s scalability and cost consequences with respect to the
to do the job and present the results in the table below : gain in performance we obtain by following the graph
structure. Similar to this, there can be further work and
TABLE III analysis based on the use cases, size of data and various
PERFORMANCE COMPARISON BETWEEN other parameters.
NEO4J & MYSQL

Category Query Neo4j MySQL

REFERENCES
Q1 2ms 31ms
Selection Q2 8ms 323ms [1] Thi-Thu-Trang Do, Thai-Bao Mai-Hoang, Van-
Q3 32ms 438ms Quyet Nguyen, and Quyet-Thang Huynh. Query-
Q4 2ms 757ms based performance comparison of graph database
Q5 2ms 290ms and relational database. In Proceedings of the 11th
Recursive
Q6 3ms 305ms International Symposium on Information and
Communication
Q7 43ms 146ms
Technology, pages 375–381, 2022
Aggregation Q8 18ms 40ms
Q9 62ms 290ms [2] Chad Vicknair, Michael Macias, Zhendong Zhao,
Q10 5ms 360ms Xiaofei Nan, Yixin Chen, and Dawn Wilkins. 2010.
Pattern
A comparison of a graph database and a relational
Matching Q11 10ms 455ms
database: a data provenance perspective. In
Q12 1ms 68ms
Proceedings of the 48th Annual Southeast Regional
Conference (ACM SE '10)
The table above shows the average values taken on
executing the same query three times sequentially as the [3] Renzo Angles and Claudio Gutierrez. Survey of
graph database models.ACM Comput. Surv.,40(1),
execution time varies for each time due to various
factors such as background processes on the system, feb 2008.
caching, and other operating system processes. [4] Shalini Batra and Charu Tyagi. 2012. Comparative
Therefore, we have chosen to re execute these queries 3 analysis of relational and graph databases.
times across three different systems and then taken an International Journal of Soft Computing and
average across all the recordings to give us a fair Engineering (IJSCE)
relative estimation of the performance difference [5] Cornelia Gyorödi, Robert Gyorödi, and Roxana
between the two databases. Sotoc. 2015. A comparative studyof relational and
non-relational database models in a Web-based
application. International Journal of Advanced
IX. CONCLUSION Computer Science and Applications 6, 11 (2015)
[6] R. Angles, M. Arenas, P. Barceló, A. Hogan, J.
In this study, we have provided an overview of Reutter, and D. Vrgoč, "Foundations of Modern
queries categorized into four groups: selection/search, Query Languages for Graph Databases," ACM
recursion, aggregation, and pattern matching. Comput. Surv., vol. 50, no. 5, Art. no. 68, Sep.
Furthermore, we have conducted a comparison between 2017, doi: 10.1145/3104031.
Neo4j (a representative of graph databases) and [7] P. Kotiranta, M. Junkkari, and J. Nummenmaa,
MySQL (a representative of relational databases) in "Performance of Graph and Relational Databases in
terms of their data query performance. Our findings Complex Queries," Applied Sciences, vol. 12, no.
demonstrate that the graph database outperforms the 13, Art. no. 6490, Jul. 2022, doi:
relational database by up to 146 times when querying 10.3390/app12136490.
complex and large datasets. As our future work, we plan [8] W. Khan, W. Shahzad, et al., "Predictive
to extend our tests to other datasets in various sectors, Performance Comparison Analysis of Relational &
such as banking, the stock market, and ERP, and NoSQL Graph Databases," International Journal of
evaluate other aspects of system performance, such as Advanced Computer Science and Applications, vol.
memory usage, power consumption, and 8, no. 5, pp. 73-79, 2017, doi:
implementation complexities. 10.14569/IJACSA.2017.080510.
From the perspective of the future, it would be
exciting to take this comparison one step further and

12
[9] L. Jachiet, P. Genevès, N. Gesbert, and N. Layaïda, [10] J. Hölsch, T. Schmidt, and M. Grossniklaus,
"On the Optimization of Recursive Relational "On the performance of analytical and pattern
Queries: Application to Graph Queries," in matching graph queries in neo4j and a relational
Proceedings of the 2020 ACM SIGMOD database," in EDBT/ICDT 2017 Joint Conference:
International Conference on Management of Data, 6th International Workshop on Querying Graph
2020, pp. 681-697, doi: 10.1145/3318464.3380594. Structured Data (GraphQ), 2017, pp. 15-22, doi:
10.1145/3035918.3035930.

View publication stats

Bhowmick Pranoy Ranjan
No ratings yet
Bhowmick Pranoy Ranjan
48 pages
Graph Databases
No ratings yet
Graph Databases
164 pages
unit-5 ET
No ratings yet
unit-5 ET
24 pages
Introduction To Relational Databases
No ratings yet
Introduction To Relational Databases
24 pages
eBook Database Advice Guide
No ratings yet
eBook Database Advice Guide
19 pages
NOSQL
No ratings yet
NOSQL
64 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
HCIA-openGauss V1.0Training Materials
No ratings yet
HCIA-openGauss V1.0Training Materials
504 pages
Bda CHP 3
No ratings yet
Bda CHP 3
75 pages
MongoDB Slides Until ClassTest
No ratings yet
MongoDB Slides Until ClassTest
221 pages
4.1 Intro Nosql
No ratings yet
4.1 Intro Nosql
45 pages
Kumar and Rishindra Reddy - 2017 - Performance evaluation of Relational, NoSQL, and Object Databases
No ratings yet
Kumar and Rishindra Reddy - 2017 - Performance evaluation of Relational, NoSQL, and Object Databases
12 pages
Wepik Unraveling Graph Analytics Comparing Relational and Graph Based Database Models 20241027163632SSyp
No ratings yet
Wepik Unraveling Graph Analytics Comparing Relational and Graph Based Database Models 20241027163632SSyp
18 pages
Dita - Kebede - BIG Data Assignment Report
No ratings yet
Dita - Kebede - BIG Data Assignment Report
25 pages
SQL and NoSQL
No ratings yet
SQL and NoSQL
5 pages
NOSQL, Graph Databases & Cypher
No ratings yet
NOSQL, Graph Databases & Cypher
78 pages
Unit 3
No ratings yet
Unit 3
10 pages
5.1 Intro Nosql
No ratings yet
5.1 Intro Nosql
22 pages
Definitive Guide Graph Databases For RDBMS Developer
No ratings yet
Definitive Guide Graph Databases For RDBMS Developer
35 pages
DBMS Unit4
No ratings yet
DBMS Unit4
28 pages
neo4j
No ratings yet
neo4j
29 pages
Bda Ass Azlaan
No ratings yet
Bda Ass Azlaan
10 pages
no sql.pptx
No ratings yet
no sql.pptx
12 pages
C2-Distributed_Databases (1)
No ratings yet
C2-Distributed_Databases (1)
95 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
DBMS Unit2
No ratings yet
DBMS Unit2
26 pages
Definitive Guide Graph Databases For RDBMS Developer PDF
No ratings yet
Definitive Guide Graph Databases For RDBMS Developer PDF
35 pages
Definitive Guide Graph Databases For RDBMS Developer
100% (1)
Definitive Guide Graph Databases For RDBMS Developer
35 pages
Database
No ratings yet
Database
12 pages
26_SQL_vs__NoSQL
No ratings yet
26_SQL_vs__NoSQL
5 pages
DBMS PPT 1
No ratings yet
DBMS PPT 1
27 pages
Iccmc51019 2021 9418441
No ratings yet
Iccmc51019 2021 9418441
5 pages
A Novel Approach To Transform Relational Database Into Graph Database Using Neo4j
No ratings yet
A Novel Approach To Transform Relational Database Into Graph Database Using Neo4j
64 pages
nosql-technology (1)
No ratings yet
nosql-technology (1)
8 pages
Ijraset Naresh
No ratings yet
Ijraset Naresh
7 pages
Comparison Between Relational and NoSQL Databases
No ratings yet
Comparison Between Relational and NoSQL Databases
6 pages
216-219, Tesma0802,IJEAST
No ratings yet
216-219, Tesma0802,IJEAST
4 pages
Understanding Database Types - by Alex Xu
No ratings yet
Understanding Database Types - by Alex Xu
13 pages
21 Mca 2326 Researchpaper
No ratings yet
21 Mca 2326 Researchpaper
14 pages
Database Advice Guide
No ratings yet
Database Advice Guide
19 pages
A Comparative Study of SQL Databases and NoSQL Databases For E-Commerce
No ratings yet
A Comparative Study of SQL Databases and NoSQL Databases For E-Commerce
6 pages
Database Management Systems: A Nosql Analysis: September 2013
No ratings yet
Database Management Systems: A Nosql Analysis: September 2013
8 pages
Database Management Systems: A Nosql Analysis: September 2013
No ratings yet
Database Management Systems: A Nosql Analysis: September 2013
8 pages
Running Head: Nosql Technologies
No ratings yet
Running Head: Nosql Technologies
14 pages
NOs QL
No ratings yet
NOs QL
14 pages
Performance of Graph Query Languages: Comparison of Cypher, Gremlin and Native Access in Neo4j
No ratings yet
Performance of Graph Query Languages: Comparison of Cypher, Gremlin and Native Access in Neo4j
10 pages
3080
No ratings yet
3080
77 pages
NoSQL Databases Critical Analysis and Comparison
No ratings yet
NoSQL Databases Critical Analysis and Comparison
7 pages
Graph Database Query Feature
No ratings yet
Graph Database Query Feature
6 pages
6.1 GCP - Cloud - Bigtable PDF
100% (1)
6.1 GCP - Cloud - Bigtable PDF
18 pages
Comparative Analysis of Relational and Graph Databases: Garima Jaiswal, Arun Prakash Agrawal
No ratings yet
Comparative Analysis of Relational and Graph Databases: Garima Jaiswal, Arun Prakash Agrawal
3 pages
A Comparison of A Graph Database and A Relational Database: A Data Provenance Perspective
No ratings yet
A Comparison of A Graph Database and A Relational Database: A Data Provenance Perspective
6 pages
Design and Implementation of A NoSQL Database
No ratings yet
Design and Implementation of A NoSQL Database
94 pages
Erp Tpo
No ratings yet
Erp Tpo
141 pages
SEM-VII AIML DE Syllabus
No ratings yet
SEM-VII AIML DE Syllabus
81 pages
Migration of Data From Relational Database To Graph Database
No ratings yet
Migration of Data From Relational Database To Graph Database
6 pages
A Comparison of Current Graph Database Models
No ratings yet
A Comparison of Current Graph Database Models
7 pages
NoSQL Database For Software
No ratings yet
NoSQL Database For Software
49 pages
BCA Part-III Sem V and VI (CBCS) Syllabus 2022
No ratings yet
BCA Part-III Sem V and VI (CBCS) Syllabus 2022
33 pages
The National University of Lesotho: Department of Mathematics and Computer Science
No ratings yet
The National University of Lesotho: Department of Mathematics and Computer Science
7 pages
Inside RavenDB 4 0
No ratings yet
Inside RavenDB 4 0
210 pages
DB Assignment
No ratings yet
DB Assignment
5 pages
prj3 (1)
No ratings yet
prj3 (1)
11 pages
Nosql Database: Abstract
No ratings yet
Nosql Database: Abstract
6 pages
NoSQL DATABSES
No ratings yet
NoSQL DATABSES
12 pages
Assignment (Data Models of DBMS)
No ratings yet
Assignment (Data Models of DBMS)
5 pages
JDSc-BCDEF
No ratings yet
JDSc-BCDEF
21 pages
Unit 4 Hadoop Eco System PDF
No ratings yet
Unit 4 Hadoop Eco System PDF
78 pages
Arxiv - 20201201 - Chen Zhu - Modifying Memories in Transformer Models
No ratings yet
Arxiv - 20201201 - Chen Zhu - Modifying Memories in Transformer Models
21 pages
Unit 4-DBP
No ratings yet
Unit 4-DBP
66 pages
Basics of NoSQL, Mongo DB
No ratings yet
Basics of NoSQL, Mongo DB
29 pages
Advanced Information and Knowledge
No ratings yet
Advanced Information and Knowledge
105 pages
COMP4801 Project Plan
No ratings yet
COMP4801 Project Plan
5 pages
UNIT-3( MONGO DB)
No ratings yet
UNIT-3( MONGO DB)
47 pages
Nosql Databases Unit-2
0% (1)
Nosql Databases Unit-2
15 pages
Mongodb Homework 3.1 Python
100% (1)
Mongodb Homework 3.1 Python
6 pages
DBMS Detailed Project
No ratings yet
DBMS Detailed Project
20 pages
unit4- ques
No ratings yet
unit4- ques
8 pages
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
No ratings yet
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
15 pages
Infosys Certified Software Programmer-Python
No ratings yet
Infosys Certified Software Programmer-Python
5 pages
Mongodb Vs Couchbase Architecture WP PDF
No ratings yet
Mongodb Vs Couchbase Architecture WP PDF
45 pages
Introduction To Nosql: - Key Value Databases
No ratings yet
Introduction To Nosql: - Key Value Databases
14 pages
IBM Data Engineering Cert_ZBvalani
No ratings yet
IBM Data Engineering Cert_ZBvalani
1 page
Google Cloud Fundamentals: Core Infrastructure: Summary and Next Steps
No ratings yet
Google Cloud Fundamentals: Core Infrastructure: Summary and Next Steps
15 pages
M.E.cse - R21 Syllabus
No ratings yet
M.E.cse - R21 Syllabus
20 pages
ST2 - Big Data - KCS061 (Updated)
No ratings yet
ST2 - Big Data - KCS061 (Updated)
2 pages
Vector Database: Definitive Reference for Developers and Engineers
From Everand
Vector Database: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet

Performance Comparison of Graph Database and Relational Database

Uploaded by

Performance Comparison of Graph Database and Relational Database

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Performance Comparison of Graph Database and Relational Database

Technical Report · May 2023

Cajetan Rodrigues Mit Ramesh Jain

The user has requested enhancement of the downloaded file.

Mit Jain Ashish Khanchandani Cajetan Rodrigues

The dataset is a collection of csv files provided

Fig. 3. Query to count all relationships

We load the dataset into tables in two phases.

B. Visualizing the MySQL schema

After the successful completion of dataset loading, we

V. MYSQL TEST ENVIRONMENT TABLE II

Table No. of tuples

VI. IMPLEMENTATION (SQL & NEO4J)

For the sake of simplicity, we will visualize the results

Fig. 9. Flowchart showing overall implementation

Q1: Looking for professionals in a specific tag?

Q3: Looking for all emails received by a particular

Q6: Looking for questions with answers recursively 3

Fig. 15. EXPLAIN ANALYSE SQL Command

Q9: Which tag has the most professionals?

VII. PERFORMANCE EVALUATIONS

Fig. 23. EXPLAIN ANALYZE command shown in

VIII. PERFORMANCE RESULTS

Our evaluation criteria would majorly look at

Category Query Neo4j MySQL

View publication stats

You might also like