Comparison of Data Migrationtechniques From SQL Databaseto Nosql Database KLKR
Comparison of Data Migrationtechniques From SQL Databaseto Nosql Database KLKR
to NoSQL Database companies to develop their own solutions and headed to emerge of
generic NoSQL database systems. This way there are more than 150
Hira Lal Bhandari*, and Roshan Chitrakar NoSQL products. These products come with issues like suitability to
some areas of application, security and reliability [3].
Abstract NoSQL databases are emerging from last few years due to its less
With rapid and multi-dimensional growth of data, Relational
constrained structure, scalable schema design, and faster access in
Database Management System (RDBMS) having Structured Query comparison to relational databases. The key attributes that make it
Language (SQL) support is facing difficulties in managing huge data different from relational database are that it does not use the table as
due to lack of dynamic data model, performance and scalability storage structure of the data. In addition, its schema is very efficient
issues etc. NoSQL database addresses these issues by providing in handling the unstructured data. NoSQL database also uses many
the features that SQL database lacks. So, many organizations modeling techniques like key-value stores, document data model, and
are migrating from SQL to NoSQL. RDBMS database deals with graph databases [1].
structured data and NoSQL database with structured, unstructured
and semi-structured data. As the continuous development of This research study aims to present comparative study on data
applications is taking place, a huge volume of data collected has migration techniques from SQL database to NoSQL database. This
already been taken for architectural migration from SQL database study analyses 7 (seven) recent approaches [4] which have been
to NoSQL database. Since NoSQL is emerging and evolving proposed for data migration from SQL database to NoSQL database.
technology in the field of database management and because
of increased maturity of NoSQL database technology, many Statement of the problem
applications have already switched to NoSQL so that extracting
information from big data. This study discusses, analyzes and There is nothing wrong in using traditional RDBMS for database
compares 7 (seven) different techniques of data migration from SQL management. As huge introduction of data from social sites and other
database to NoSQL database. The migration is performed by using digital media, it simply isn’t enough for the application dealing with
appropriated tools / frameworks available for each technique and huge databases. Also, NoSQL databases need cheap hardware. Hence,
the results are evaluated, analyzed and validated using a system
requirement of some of the relational databases need to be converted
tool called SysGauge. The parameters used for the analysis and the
comparison are Speed, Execution Time, Maximum CPU Usage and
to NoSQL databases which then enable to overcome drawbacks
Maximum Memory Usage. At the end of the entire work, the most found in relational databases. Some drawbacks of relational database
efficient techniques have been recommended. management systems are:
Keywords 1. They do not encompass a wide range of data models in data
management.
Data Migration; MySQL; RDBMS; Unstructured Data; SysGauge
2. They are not easily scalable because of their constrained
Introduction structure.
In 1970, Edgar Frank Codd has introduced architectural 3. They are not efficient and flexible for unstructured and semi-
framework on the relational database approach in his paper.”A structured database.
relational model of data for large shared data banks” [1]. After some 4. They cannot handle data during hardware failure.
time Codd has introduced Structured English Query Language and
later has renamed it as Structured Query Language to provide a way Due to massive use of mobile computing, cloud computing,
to access data in a relational database [2]. Since then, relational model Internet of Things, and other so many digital technologies, large
has had dominant form in the database market.The most popularly volume of streaming data is available nowadays. Such huge amounts
has used database management systems are Oracle, Microsoft SQL of data take a great deal of challenges to the traditional relational
server and MySQL [2]. All these three DBMS are based on relational database paradigm. Those challenges are related to performance,
database model and use SQL as query language.When NoSQL scalability, and distribution. To overcome such challenges enterprises
database has been introduced by Carlo Strozzi in 1998 as a file based begin to move towards implementing new database paradigm known
database, it has been used to represent relational database without as NoSQL [5].
using Structured Query Language. However, it has not be able to On the other hand, NoSQL database contains several different
models for accessing and managing data, each suited to specific
use cases. This is also significant reason to migrate data from SQL
*Corresponding author: Hira Lal Bhandari, Faculty of Science Health and database to NoSQL database. The several models are summarized in
Technology Nepal Open University, Nepal. E-mail: [email protected] the Table 1.
Received: November 01, 2020 Accepted: December 14, 2020 Published:
December 21, 2020
NoSQL DBMSs are distributed, non-relational databases. They
All articles published in Journal of Computer Engineering & Information Technology are the property of SciTechnol, and is
International Publisher of Science,
protected by copyright laws. Copyright © 2020, SciTechnol, All Rights Reserved.
Technology and Medicine
Citation: Bhandari HL, Chitrakar R (2020) Comparison of Data Migration Techniques from SQL Database to NoSQL Database. J Comput Eng Inf Technol
9:6.
doi: 10.37532/jceit.2020.9(6).241
are designed for large-scale data storage and for massive parallel in organizing the data which makes it easy to access the data. The data
data processing across a large number of commodity servers. They generated from social networking sites and real time applications
use non-SQL languages and mechanisms to interact with data. Use needs flexible and scalable system which increases the requirement
of NoSQL database systems in database management increased in of NoSQL. Hence, multidimensional model has been proposed for
major Internet companies, such as Google, Amazon, and Facebook; data migration. The biggest challenge is the migration of existing data
which has aroused challenges in dealing with huge quantities of data residing in data warehouse to NoSQL database by maintaining the
with conventional RDBMS solutions could not cope. These systems characteristics of the data. The growing use of web applications has
can support multiple activities, including exploratory and predictive raised the demand to use NoSQL because traditional databases are
analytics, ETL-style data transformation, and non-mission critical unable to handle the rapidly growing data [4].
OLTP. These systems are designed so as to scale up thousands
The concept of NoSQL was first used in 1998 by Carlo Strozzi
or millions of users doing updates as well as reads, in contrast to
to represent open source database that does not use SQL interface.
traditional DBMSs and data warehouses [6].
Strozzi likes to refer to NoSQL as “noseequel” since there is difference
The focus of the study is to get comparative study on different between this technology and relational model. The white paper
seven techniques to migrate data from relational database to published by Oracle mentions techniques and utilities for migrating
NoSQL database. Migration of data from relational database to non Oracle databases to Oracle databases [7]. Abdelsalam Maatuk
NoSQL database refers the transformation of data from structured [8] describes an investigation into approaches and techniques used
and normalized database to flexible, scalable and less constrained for database conversion. Its origin is also regarded to the invention of
structure NoSQL database. The main objective of this research is to Google’s BigTable model. This database system, BigTable, is used for
find out the most efficient data migration technique among seven storage of projects developed by Google, for example, Google Earth.
major migration techniques from SQL database to NoSQL database. BigTable is a compressed high performance database which was
initially released in 2005 and is built on the Google file system. It was
Scope and Limitations of the Research Study developed using C and C++ languages. It provides consistency, fault
Scope and limitation of this research covers the following: tolerance and persistence. It is designed to scale across thousands of
This study is focused to get analyzed with different techniques to machines and it is easy to add more machines to it [9]. Later, Amazon
migrate the data from SQL database to NoSQL database to know developed fully managed NoSQL database service DynamoDB that
efficient migration technique so that one can efficiently adapt is used to provide a fast, highly reliable and cost effective NoSQL
emerging technology in the database world. Therefore, the study database services designed for internet scale applications [9]. These
does not include technical discussion of the risks identified, or of the projects directed a step towards the evolution of NoSQL.
implementation guideline here. The demand for NoSQL databases However, the term re-emerged only in 2009, at a meeting in
is increasing because of their diversified characteristics that offer San Francisco organized by Johan Oskarsson. The name for the
rapid, smooth, scalability, great availability, distributed architecture, meeting, NoSQL meetup, was given by Eric Evans and from there
significant performance and rapid development agility. It provides a on NoSQL became a buzzword [8]. Many early papers have talked
wide range of data models to choose from and is easily scalable where about the relationship between Relational and NoSQL Databases
database administrators are not required. Some of the SQL to NOSQL which gave a brief introduction of NoSQL database, its types and
data migrating providers like Riak and Cassandra are programmed to characteristics. They also discussed about the structured and non-
handle hardware failures and are faster, more efficient and flexible. It structured database and explained how the use of NoSQL database
has evolved at a very high pace. like Cassandra improved the performance of the system, in addition
However, some data migration techniques and NoSQL is still to it can scale the network without changing any hardware or buying
immature and they do not have standard query language. Some bigger server. This result is improving the network scalability with
NoSQL databases are not ACID compliant. No standard and data low-cost commodity hardware [10].
loss are the major problems while migrating data from SQL database Sunita Ghotiya [4] gave literature review of some of the recent
to NoSQL database. approaches proposed by various researchers to migrate data from
Review of Related Works Relational to NoSQL databases. Arati Koli and Swati Shinde [11]
presented comparison among five different techniques to migrate
This research study provides the comparative study on different from SQL database to NoSQL database with the help different
data migration approaches from SQL database to NoSQL databases. research paper reviews. Shabana Ramzan, Imran Sarwar Bajwa and
This focuses on the study of major migration techniques and suggests Rafaqut Kazmi [12] stated the comparison of transformation in
the efficient approach for data migration. Migrating process is tabulated format with different parameters such as source database,
performed with the help of tools/ framework available. target database, schema conversion, data conversion, conversion
time, data set, techniques, reference papers which clearly shows the
SQL database and other traditional databases strictly follow
research gap that currently no approach or tool supports automated
structured way to organize the data generated from various
transformation of MySQL to Oracle NoSQL for both data and
applications but NoSQL databases provide flexibility and scalability
doi: 10.37532/jceit.2020.9(6).241
Methodology which clarifies the structure of data. Table 2 includes six different
This research study evaluates major migration approaches which columns and seven different rows. First column consists of fields such
have been proposed in the previous research papers. The evaluation as user id, user name, last name, Gender, password and Status. They
is done through comparative study on the migration approaches have int and varchar data type. int basically the numeric data type and
efficiency measurement with different parameters. They are Speed, varchar is the character data type.
Execution Time, Maximum CPU Usage, and Maximum Memory
Environment and Comparison Characteristics
Usage. Migration of data from SQL database to NoSQL database
belonging to different migration approaches is done using available Implementation Details: This section includes the details of
framework/tools. implementation of the study in which an experiment to execute the
data migration between the data stores was setup. Microsoft Windows
In the Figure 1 we have presented the workflow that has been
machine with the following configuration is used to run all type of
followed during the entire process of data transformation. This
data migration approaches using respective tools Table 2.1.
helps to systematically run and verify each job as it was essential in
concluding the study among major migrating approaches performed. Only the migrating tools and concerned database were allowed to
This way we can trace the most efficient migration approach to run whereas all others shut down to make sure that no other variable
transform data from traditional normalized Database to NoSQL had impact on the result. After the completion of each job, the tools
database. and databases were restarted. SysGauge tool was used to analyse
the processes running on the machine with respect to the CPU and
Figure 1 shows how data is migrated from source data store to
memory utilization. The process specific to the technology was studied
destination data store i.e. SQL database to NoSQL databases. Here in
using ’SysGauge’ and the quantitative characteristics like maximum
the diagram each migration approach is planned to implement with
CPU, Memory and Time are documented as Maximum CPU load,
the help of respective technology i.e. tools/ framework. Data store 1
Maximum Memory Usage and CPU Time respectively. Figure 2
signifies SQL database i.e. MySQL and data store 2 implies MogoDB
shows an instance of the SysGauge tool in which the characteristics
and HBase. Up to the migrating process completion, SysGauge tool
are highlighted.
is run to check either other processes are run or not. If there are
processes running that will be shut down, then only the migration Characteristics of Comparison: In this section, a set of well-defined
technology run for respective migration approaches using tools/ characteristics have been discussed which can be considered for
framework. comparative study. Previous study stated NoSQL databases are often
evaluated on the basis of scalability, performance and consistency.
Data Description In addition to that system or platform dependent characteristics
The source of sample database to migrate from SQL database to there could be complexity, cost, time, loss of information, fault
NoSQL data. Database used in the migrating process is structured tolerance and algorithm dependent characteristics could be real time
database. Data set containing in the database table consists of 1000 processing, data size support etc. To meet the scope of this research,
number of records. The database table schema is presented below quantitative characteristics are considered hence actual values are
doi: 10.37532/jceit.2020.9(6).241
doi: 10.37532/jceit.2020.9(6).241
doi: 10.37532/jceit.2020.9(6).241
doi: 10.37532/jceit.2020.9(6).241
1. Creating the MongoDB database. The user must specify the Load: During the load step, it is necessary to ensure that the load is
performed correctly and with as little resources as possible. The target
MySQL database that will be represented in MongoDB. The
of the Load process is often a database. In order to make the load
database is created with the following MongoDB command: use process efficient, it is helpful to disable any constraints and indexes
before the load and enable them back only after the load completes.
DATABASE NAME.
The referential integrity needs to be maintained by ETL tool to ensure
2. Creating tables in the new MongoDB database. The algorithm consistency.
verifies for each table in what relationships is involved, if it has Steps: -
foreign keys and/or is referred by other tables. 1. Lock the target database in source system.
3. If the table is not referred by other tables, it will be represented 2. Lock the target database in destination system.
by a new MongoDB collection. 3. Extract information from target database from
4. If the table has not foreign keys, but is referred by another Source system.
table, it will be represented by a new MongoDB collection. 4. Transform information to destination database.
5. If the table has one foreign key and is referred by another 5. Release lock of source and destination systems.
table, it will be represented by a new MongoDB collection. Discussion
In our framework, for this type of tables we use linking In this section we discuss the results of the experiment and also
method, using the same concept of foreign key. report the challenges that we faced during the entire phase.
6. If the table has one foreign key but is not referred by another Comparing Quantitative Characteristics of Migration Approaches:
This determinative evaluation was used to check if the study is going
table, the proposed algorithm uses one way embedding model. in the right direction. The data migration methodologies which were
implemented in this research study are compared with one another
So, the table is embedded in the collection that represents the
and evaluated in the matrix as described. Since each aspect cannot be
table from the part 1 of the relationship. predicted at the initial of the study and due to unexpected changes
that happened at different phases, a revision of the methodologies was
7. If the table has two foreign keys and is not referred by another
necessary at every stage.
table, it will be represented using the two way embedding
Migrating Results
model, described in section 2.4.
An implementation details as described earlier was environmental
8. If the table has 3 or more foreign keys, so it is the result of setup; the values of maximum CPU load, CPU time, and maximum
memory usage are retrieved using the SysGauge tool, outcome of
a N:M ternary, quaternary relationships, the algorithm uses
execution time, speed are documented from the respective technology
the linking model, with foreign keys that refer all the tables used in the migration process and the results are compiled as shown
in the Table 3. There were 3 target data stores such as MongoDB, CMS
initially implied in that relationship and already represented as
Database and Hadoop Database used in the research study. The tools
MongoDB collections. The solution is good even the table is and framework involved in the transformation were MysqlToMongo,
phpmyadmin, mysq l2, NoSQLBooster for MongoDB, Sqoop and
referred or not by other tables.
Studio 3T.
Extract-Transform-Load approach: The term ETL came into
Transformation result varies from one migration technique
existence from data warehousing and is an acronym for Extract-
to another technique that was evaluated according to the values
Transform-Load. ETL insists a process of how the data are loaded
retained from execution of respective methodologies. That execution
from the source system to the data warehouse [19, 20]. In these days,
was performed with the help of tools or framework which belongs
the ETL enhances a cleaning step as a separate step. The sequence is
to different migration approaches. Evaluated result of different
then Extract-Transform-Load.
migration approaches are discussed below:
Extract: The Extract step consists of the data extraction from the
Mid-model Approach using Data and Query Features:
source system and makes it accessible for further processing. The
MongoDB using MysqlToMongo Framework): MysqlToMongo tool
main aim of the extract step is to fetch all the necessary data from the
is used to migrate data from MySQL to MongoDB. It uses data and
source system with as minimal amount of resources as possible.
query features. It transforms structured data of size 2833.3 KB per
Transform: The transform step applies a set of rules to transform second from MySLQL to MongoDB. Data set having size 85 KB and
the data from the source to the target. This includes converting including data 1000 rows is transformed in 0.03 sec. At the time of data
any measured data to the same dimension using the same units so transformation from MySQL to MongoDB using MysqlToMongo
that they can later be joined. The transformation step also requires tool, Maximum CPU Usage is 23 percentage and Maximum memory
joining data from several sources, generating aggregates, generating consumption is 9.1 percentage and after transformation and
surrogate keys, sorting, deriving new calculated values, and applying conversion of SQL database is 4 Kb.
advanced validation rules.
doi: 10.37532/jceit.2020.9(6).241
doi: 10.37532/jceit.2020.9(6).241
records from SQL database to NoSQL database. Then there are other
techniques such as Mid-model Approach, Extract-Transform-Load
Approach and Data Adapter Approach are the techniques which
consume lesser time in data migration. The execution time during
the completion of data migration by them are are 0.03 Sec., 0.07 Sec.
and 0.1 Sec . respectively. Thus, we can come to the conclusion that
NoSQLayer is the migrating technique which is the most efficient
from the execution time point of view.
In the Figure 9, horizontal axis shows the techniques that are
used in migration and vertical axis is used to rep-resent Maximum
CPU Usage percentage which is consumed during the completion
of data migrating process from SQL Database to NoSQL Database.
Maximum CPU Usage of Data Adapter Approach has 14 percentages
Figure 9: Maximum CPU Usage Percentage.
which is comparatively the least among seven migration techniques.
Then, NoSQLayer Approach and Mid-layer Approach have 21
percentage and 23 percentage CPU Usage respectively. They are two
other techniques which have lesser CPU Usage. Thus, we can come to
the conclusion that Data Adapter Approach the most efficient from
the CPU Usage point of view i.e. it uses only the 14 percentage of the
CPU Load during the complete migration of 1000 number of records
from SQL Database to NoSQL Database.
In the Figure 10, horizontal axis shows the techniques that are
used in migration and vertical axis is used to represent Maximum
Memory Usage percentage which is consumed during the completion
of data migrating process from SQL Database to NoSQL Database.
Maximum CPU Usage of Data Adapter Approach has 5.4 percentages
which is comparatively the least among seven migration techniques.
Then, NoSQLayer Approach and Mid-layer Approach has 7.1
percentage and 9.1 percentage Memory Usage respectively. These are
Figure 10: Maximum Memory Usage Percentage. the two other techniques which have lesser Memory Usage. Thus, we
can come to the conclusion that Data Adapter Approach is the most
percentage, Maximum Memory Usage percentage transformed per efficient from the Memory Usage point of view i.e. it uses only the
second for each migration approach have also been plotted to convey 5.4 percentage of the Memory Load during the complete migration
the efficiency of each migrating technique. of 1000 number of records from SQL database to NoSQL databases.
Summarization of the Results: Although, a final result for The experiments, results, analysis and comparisons show
migrating speed amongst major migration techniques has been that HBase Database Technique, Content Management System
drawn, there were other results which further verify the efficiency Approach, Automatic Mapping Framework and ETL Approach
of the migration techniques which has helped validate our results to Technique reached a higher maximum CPU and memory loads than
measure the efficiency of the transformation techniques: To depict other techniques during the migration process. It is also seen from
clear picture for migrating techniques’ efficiency, the results for each the viewpoint of Speed of Data migration and Execution time, the
parameter has been presented. NoSQLayer Approach is the most efficient. And, from CPU Usage
and Memory Usage point of view, the Data Adapter is the most
In the Figure 7, horizontal axis shows the techniques that efficient technique.
are used in migration and vertical axis is used to represent data in
byte to be migrated in a second during the migrating process from Conclusion
SQL Database to NoSQL Database. From the Figure 7, NoSQLayer
The main objective of this study is to compare various
Approach is migrating largest data size i.e. 8,500 kilo byte per second
approaches of data migration from SQL to NoSQL by using
from SQL database to NoSQL database. Then Mid-model Approach,
well defined characteristics and datasets. In order to address the
Extract Transform-Load Approach and Data Adapter Approach are
growing demands of modern applications to manage huge / big
better from data migrating speed point of view. The migrating speed
data in an efficient manner, there emerges a need of schema-less
of these approaches is 2833.3 KB, 1214.29 KB and 850 KB per second
NoSQL databases that is capable of managing large amount of data
respectively. Thus, we can come to the conclusion that NoSQLyer is
in terms of storage, access and efficiency. The main focus of this
the migrating technique which is the most efficient from the migrating
research is to carry out a comparative study and analysis of most
speed point of view.
common migrating approaches using most appropriate tools (other
In the Figure 8, horizontal axis shows the techniques that are than commercially available ones) that prefer basic and practical
used in migration and vertical axis is used to represent total execution conversion from structured data to unstructured data. In this work,
time which is consumed during the completion of data migrating 7 (seven) migrations procedures have been performed one-by-one
process from SQL database to NoSQL database. From the Figure 8, and separatley by using freely available resources (data and tools)
NoSQLayer Approach has taken 0.01 Sec. to migrate 1000 number of and then performance analysis of each procedure has been evaluated
doi: 10.37532/jceit.2020.9(6).241
on the basis of performance parameters. Further, all the challenges 10. Mohamed A, Altrafi G,Ismail O (2014) “Re-lational Vs. NoSQL databases: A
survey,” Int J Comput Inf Technol, 2279–2764.
faced during the course of this work have been documented for
future reference. The main contribution of this work is that it will 11. Koli A, Shinde S (2017) “Approaches used in efficient migration from
serve as guidelines for organizations looking for migrating data from Relational Database to NoSQL Database,” Proc Second Int Conf Res Intell
Comput Eng, 10: 223–227.
structured to semi or unstructured repository in the most efficient
way. 12. Ramzan S, Bajwa S, Kazmi R (2018) “An intelligent approach for han-dling
complexity by migrating from conventional databases to big data,” Symmetry
References (Basel), 10:1-12.
1. Mohamed H, Omar B, Abdesadik B (2015) “Data Migration Methodology 13. Chakrabarti A, Jayapal M (2017) “Data transformation methodologies
from Relational To NoSQL Databases. Inter J Comp App, 9: 2511–2515. between heterogeneous data stores: A comparative study,” Data 2017 –
Proc 6th Int Conf Data Sci Technol Appl, 241–248.
2. Pretorius D (2013) “NoSQL database considerations and implications for
businesses” Inter J Comp App. 14. Kuderu N, Kumari V (2016) “Relational Database to NoSQL Conversion by
Schema Migration and Mapping,” Int J Comput Eng Res Trends, 3: 506.
3. Mughees M (2013) “NoSQL, Data migration from standard SQL to NOSQL.
15. Khourdifi Y,Bahaj M, Elalami A (2018) “A new approach for migration of
4. Ghotiya S, Mandal J,Kandasamy S (2017) “Migration from relational to a relational database into column-oriented nosql database on hadoop,” J
NoSQL database,” IOP Conf Ser Mater Sci Eng 263: 1-4. Theor Appl Inf Technol, 96: 6607.
5. Yassine F,Awad M (2018) “Migrating from SQL to NOSQL Database: 16. Tiyyagura N,Rallabandi M, Nalluri R (2016) “Data Migration from RDBMS
Practices and Analysis,” Proc 13th Int Conf Innov Inf Technol 58-62. to Hadoop,”184.
6. Moniruzzaman B, Akhter H (2013) “NoSQL Database: New Era of Databases 17. Seshagiri V, Vadaga M, Shah J, Karunakaran P (2016) “Data Migration
for Big data. Technology from SQL to Column Oriented Databases (HBase),” 5:1-11.
7. Potey M,Digrase M, Deshmukh G, Nerkar M (2015) “Database Mi-gration 18. Liao T (2016) “Data adapter for querying and transformation between SQL
from Structured Database to non- Structured Database,” Int J Comput Appl, and NoSQL database,” Futur Gener Comput Syst, 65: 111–121.
8975–8887.
19. Lalitha R (2016) “Classical Data Migration Technique in Multi-Database
8. Abramova P, Veronika B, Jorge F (2014) “Experimental Evaluation of Indoor Systems ( SQL and NOSQL ),” Int J Comput Sci Inf Technol,7: 2472–2475.
Visual Comfort,” Int J Database Manag Syst. 6:1-16.
20. Yangui R, Nabli A, Gargouri F (2017) “ETL based framework for NoSQL
9. Ameya N, Anil P, Dikshay P (2013) “Type of NOSQL databases and its warehousing,” Lect Notes Bus Inf Process, 299: 40-53.
comparison with relational databases” Int J Appl Inf Syst 5: 16-19.