Nosql Products: It Giants Perspectives: Shagufta Praveen
Nosql Products: It Giants Perspectives: Shagufta Praveen
Nosql Products: It Giants Perspectives: Shagufta Praveen
Shagufta Praveen*
Computer Science Department, Glocal University. Saharanpur, Uttar Pradesh, India
Abstract
1. INTRODUCTION
NoSQL is a new emerging topic in the field of database. It does not follow the rules of
conventional database whereas it has its own rules to achieve data storage that can
solve the scalability issue. It analyzes and deals with large amount of data stored in
distributed forms on multiple servers. It is a non-relational, schemaless database based
on BASE property [2] and CAP Theorem [3]. As per IDC survey in 2012 [4] Statistics
of Data in 2010 has reached to ZettaByte from 130 ExaByte. This proves that every
five years data multiplies 10 times. This rise of data is because of Web Based
2126 Shagufta Praveen and Dr. Umesh Chandra
applications (Facebook, twitter, Linkedln etc) which have become an essential part of
our day to day lives. Social sites and projects (Google map, Indian Railways etc) that
deal with massive data and are also aware that their content will rise with time, so to
handle the problems related to scalability and storage NoSQL is the only option they
have. Due to this many Leading organizations opted for various products of NoSQL.
Big Table is a NoSQL Product used by Google. Google is one of the renowned and
leading organization of the world. More than Sixty products of Google use Bigtable
today. Products like Google Analytics, Orkut, Search, Google earth, Gmail, Google
Book Search, Google Code and You Tube and many more are the example of
products or projects that deal with BigTable.
Row keys in BigTable are arranged in alphabetical order. These are connected to
number of column families where every row key has reversed URL. Every column
family has number of columns where every cell has different content with different
time stamps .In Fig 1. “Column family: Contents” contain the all content of the web
pages of the Google as per the row key URL (which is reversed) and Column family:
Anchor contains the content of all the link and text of anchors that refer the webpage.
This huge data of WebPages called as Web Table.
GFS (Google File system) is the storage part of the Google. All logs and data files are
stored in GFS of Big Table.
2128 Shagufta Praveen and Dr. Umesh Chandra
Dynamo is a NoSQL product that belongs to key-value family. Amazon, one of the
famous e-commerce organization uses dynamo as data store. Dynamo is preferred due
to its better availability and high scalability. Amazon can’t compromise with
availability as customer accessing Amazon’s website can add items and view different
products only when products and products information will be available to them. In
order to achieve this scenario dynamo provide primary-key only interface. Replication
of data is done to maintain the backup but to achieve consistency between all the
replication is another big issue. The update conflict generally arises during write and
read operation when customer made an update and changes are not reached by
replicas. This may result in poor inconsistency and bad customer experience. Dynamo
handles application where updates are not rejected even in network partition or server
failure. The core distributed system techniques used in dynamos: partitioning,
replication, versioning, membership, failure handling and scaling [6].
NoSQL Products: IT Giants Perspectives 2129
Features Response
Availability High
Reliability Yes
Consistency Sacrifices consistency at some situations
Replication/partioni Using Consistent Hashing[14]
ng
Consistency among Achieved by Quorum technique
replica
Built on Java, Node.js, C# .NET, Perl, PHP, Python, Ruby, Has
kell
Network Failure Read and write operations are possible due to different
conflict mechanism
Flexible No Fix Schema
Cost-effective $1 for storing 1GB per month
Features Response
Reliability Yes
Scalability Yes
Partition Consistent Hashing[14]
Schema Schema-free
Implementation Language Java
Transactions Do not support transaction
Store Structured./unstructured/semi-structured
Developed by FB For Inbox search
Content Open Source
2130 Shagufta Praveen and Dr. Umesh Chandra
Features Response
Consistency Tunable consistency (Follows strict
quorum & eventual consistency)[15]
Replication, Partitioned Automatically replicated and
partitioned
No Failures Node Independency
In memory caching Yes
Read Performance Good
Scalability High
Concurrency High
Features Response
Schema Schema-less
Scalability Yes, Horizontal Scalable
Transaction No Transaction
Data Semi-Structured, Structured
NoSQL Products: IT Giants Perspectives 2131
2.7. Neo4j: Neo4j is an open source, NoSQL graph database which is well known
in the area of networks and web application. High scalability, availability and better
reliability is a cause for use of Neo4j.Delete and update operations is required in order
to claim that neo4j is a realistic and strong candidate for replacing relational database
[12]. Cisco uses the commercial edition of Neo4j and it is surprised to know that out
of 2000, there are more than 20 companies using Neo4j as database [13 ]. In Neo4j, 4j
is describing about java that’s make this more robust and secure.
Table.7. Neo4jFeatures
Features Response
Data Model Flexible/Node-edges
Availability High
Data Connected/Semi-structured
Language Cypher Query language [16]
Content OpenSource
Database Type Transactional Database
ty ontal
Scalable)
Languag Top of C++ Java Java Cyber Java Java, Node.js,
e Used libraries(open source Query C# .NET, Perl, PHP
Code Languauge
Consiste Strong Consistence Consistent Tunable Strongly Immediate Sacrifice consistency
ncy Hashing Consistency Consistent Consistenc at some situation
e
Data Semi- Structured./unst Semi- Connecred/ Semi- Structured./unstructu
structured/structured/u ructured/semi- Structured, Semi- Structured, red/semi-structured
nstructured structured Structured structured Structured
Replicati Yes (3 copies are Consistent Automatically Master – Automatic Using Consistent
on created Hashing replicated and Slave ally Hashing
partitioned replication, replicated
Full graph
Replication
Transact Atomic Row update No Transaction No Transaction No Provide TRansaction
ion Transaction al Database Transactio
n
CONCLUSION
A study of the use of NoSQL products by the IT giants namely Google, Amazon,
Linked In, Facebook and CISCO shows the use of mainly six products. Using data
models, working and internal mechanisms, this paper shall find out the possible
reasons for the use. Finally, this paper shall study the prospects of six NoSQL
products and suggest which of these is the most efficient for variety purposes.
REFERENCES
[1] Lemieux, F.,Current and Emerging Trends in Cyber Operations: Policy,
Strategy and Practiceedited
[2] Praveen, S., and Chandra, U.,2017, A comparative study on NoSQL,
NewSQL and Polygot persistence,
[3] Gilbert, S., and Lynch, N.,” Brewer conjecture and the feasibility of
consistent, available and partition-tolerance web services”,ACM SIGACT
News 33, 2 pp 51-59 March 2002
[4] Gantz, J., and Reinse, D., The Digital Universe in 2020: Big Data, Bigger
Digital Shadows, and Biggest Growth in the Far East[R]. IDC iView,
Sponsored by EMC, December 2012
[5] Bigtable: A Distributed Storage System for Structured Data by Google, Inc.,
To appearin OSDI 2006
[6] DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A.,
Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W., Dynamo:
Amazon’s Highly Available Key-value Store Amazon.com, SOSP’07,
NoSQL Products: IT Giants Perspectives 2133