Nosql Products: It Giants Perspectives: Shagufta Praveen

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

International Journal of Computational Intelligence Research

ISSN 0973-1873 Volume 13, Number 8 (2017), pp. 2125-2133


© Research India Publications
http://www.ripublication.com

NoSQL Products: IT Giants Perspectives

Shagufta Praveen*
Computer Science Department, Glocal University. Saharanpur, Uttar Pradesh, India

Dr. Umesh Chandra


Computer Science Department, Glocal University, Saharanpur, Uttar Pradesh, India.

Abstract

The immense growth of unstructured data provoked many organizations to


think about the future of database. No doubt, NoSQL proved to be a game
changer in IT world. Companies like Yahoo, Amazon, Google, linkedln, Cisco
opted for NoSQL. This paper highlights the reason behind selection of NoSQL
product. This also concludes that every organization has chosen different
NoSQL product due to their diverse objectives. This analysis can bring out the
most efficient NoSQL product among selected ones.

Keyword: NoSQL Products, IT Giants, Different Perspectives

1. INTRODUCTION

NoSQL is a new emerging topic in the field of database. It does not follow the rules of
conventional database whereas it has its own rules to achieve data storage that can
solve the scalability issue. It analyzes and deals with large amount of data stored in
distributed forms on multiple servers. It is a non-relational, schemaless database based
on BASE property [2] and CAP Theorem [3]. As per IDC survey in 2012 [4] Statistics
of Data in 2010 has reached to ZettaByte from 130 ExaByte. This proves that every
five years data multiplies 10 times. This rise of data is because of Web Based
2126 Shagufta Praveen and Dr. Umesh Chandra

applications (Facebook, twitter, Linkedln etc) which have become an essential part of
our day to day lives. Social sites and projects (Google map, Indian Railways etc) that
deal with massive data and are also aware that their content will rise with time, so to
handle the problems related to scalability and storage NoSQL is the only option they
have. Due to this many Leading organizations opted for various products of NoSQL.

2. ORGANIZATION AND NOSQL PRODUCTS

Table .1. NoSQL Products and Family

S.No. NoSQL Products Family it belongs


1. BigTable Column-Oriented Database
2. Cassandra Column-Oriented Database
3. HBase Column-Oriented Database
4. Dynamo Key-value Store
5. Voldemort Key-Value Store
6. Neo4j Graph Data Store
7. MongoDB Document oriented Database

IT Giants Big Table


1
Google 1 Cassandr
a
2 2
Facebook NoSQL
HBase
Products
3 3
Facebook Messenger (More than
150 Products)
4
4 Dynamo
Amazon
6
5 5
Linkedln Neo4j Voldemor
6 t
CISCO

Fig. 1. Organization and NoSQL Products


NoSQL Products: IT Giants Perspectives 2127

2.1. BigTable: Used By Google

Big Table is a NoSQL Product used by Google. Google is one of the renowned and
leading organization of the world. More than Sixty products of Google use Bigtable
today. Products like Google Analytics, Orkut, Search, Google earth, Gmail, Google
Book Search, Google Code and You Tube and many more are the example of
products or projects that deal with BigTable.

BigTable is distributed, multidimensional database indexed by row key, column key


and time stamp(64 bit , int). Among all three row key is the primary key, collection of
column keys are called column family and name of the column family is called
column qualifier.

Column family: Column Family: Anchor


Row Key
Contents
(Reversed URL)
Time stamp Version Time Stamp Version
{t1,t2, t3} {t1,t2,t3}

Fig.2. Row key and Columns

Row keys in BigTable are arranged in alphabetical order. These are connected to
number of column families where every row key has reversed URL. Every column
family has number of columns where every cell has different content with different
time stamps .In Fig 1. “Column family: Contents” contain the all content of the web
pages of the Google as per the row key URL (which is reversed) and Column family:
Anchor contains the content of all the link and text of anchors that refer the webpage.
This huge data of WebPages called as Web Table.

Single row transaction is only possible in BigTable. It doesn’t not provide


transactions across multiple row keys that is why consistency across rows cannot be
achieved in Big Table. Code of C++ is used to make row mutation. API of BigTable
provides functions for making, removing, manipulating tables, clusters, column
families. It also support map reduce to process large amount of data in various nodes
of cluster together. Clusters help BigTable in Job scheduling and resource
management. A single node in a cluster processes 10,000 queries per second.

GFS (Google File system) is the storage part of the Google. All logs and data files are
stored in GFS of Big Table.
2128 Shagufta Praveen and Dr. Umesh Chandra

Table 2. BigTable Features


Features Response
Persistence yes
Replication Yes (3 copies are created )
Avalability Yes
Transaction Atomic Row update
Implementation Top of C++ libraries(open source Code)
Reliability Yes
GFS/Bigtable GFS-> File System, BigTable-> Database
Structured/ GFS stores unstructured data, BigTable stores
unstructured/semi-structured structured and semi-structured data
MapReduce Yes
Compactions Yes(shrink memory of tablet server, reduce amount of
data to be read)
Compression Encode(100-200 MB/s), Decode(400-1000 MB/s)
Cache Scan Cache, Block Cache [5]
Garbage Collection Yes(Last version of time stamps are considered)
Schema Column family schema

2.2. Dynamo: Used by Amazon

Dynamo is a NoSQL product that belongs to key-value family. Amazon, one of the
famous e-commerce organization uses dynamo as data store. Dynamo is preferred due
to its better availability and high scalability. Amazon can’t compromise with
availability as customer accessing Amazon’s website can add items and view different
products only when products and products information will be available to them. In
order to achieve this scenario dynamo provide primary-key only interface. Replication
of data is done to maintain the backup but to achieve consistency between all the
replication is another big issue. The update conflict generally arises during write and
read operation when customer made an update and changes are not reached by
replicas. This may result in poor inconsistency and bad customer experience. Dynamo
handles application where updates are not rejected even in network partition or server
failure. The core distributed system techniques used in dynamos: partitioning,
replication, versioning, membership, failure handling and scaling [6].
NoSQL Products: IT Giants Perspectives 2129

Table.3. Dynamo Features

Features Response
Availability High
Reliability Yes
Consistency Sacrifices consistency at some situations
Replication/partioni Using Consistent Hashing[14]
ng
Consistency among Achieved by Quorum technique
replica
Built on Java, Node.js, C# .NET, Perl, PHP, Python, Ruby, Has
kell
Network Failure Read and write operations are possible due to different
conflict mechanism
Flexible No Fix Schema
Cost-effective $1 for storing 1GB per month

2.3. Cassandra : Used by Facebook


Cassandra is a NoSQL product from column oriented family. It is a distributed storage
system that deals with a large amount of data located across many servers. It works on
inexpensive hardware, deals with high write throughput without compromising with
read efficiency. A highly scalable and reliable data store designed for Inbox Search
problem of Facebook. Inbox search data store is support by Cassandra. In June 2008
around 100 million user were using Inbox search whereas today 250 Million users are
using it [7].API of Cassandra has three simple methods : Insert(), get(). delete(). A
search box presents in the messages tab is to search message content or message
sender. It also uses cache mechanism to search data fast. When user searches content
on search bar then the actual query executed on the cluster’s buffer cache where
search results are probably in the memory. This way search in the box becomes faster.

Table.4. Cassandra Features

Features Response
Reliability Yes
Scalability Yes
Partition Consistent Hashing[14]
Schema Schema-free
Implementation Language Java
Transactions Do not support transaction
Store Structured./unstructured/semi-structured
Developed by FB For Inbox search
Content Open Source
2130 Shagufta Praveen and Dr. Umesh Chandra

2.4. Voldemort: Used by Linkedln


Voldemort is a distributed system, Big, fault-tolerant hash table. It provides horizontal
scalability and high availability which manages multiple data centers using storage
system. To achieve good performance and high availability key-value data access is
used. Voldemort uses in-cache memory, replication of data occurs automatically over
multiple servers and it also handle server failures [8 ]. Voldemort is inspired by
Amazon’s Dynamo in order to support bulk loading terabytes of read only data [ 9].

Table.5. Voldemort Features

Features Response
Consistency Tunable consistency (Follows strict
quorum & eventual consistency)[15]
Replication, Partitioned Automatically replicated and
partitioned
No Failures Node Independency
In memory caching Yes
Read Performance Good
Scalability High
Concurrency High

2.5. HBase: Used by Facebook messenger


The Facebook came up with new social box having features of E-mail, SMS, IM, text
messages, on-site Facebook message. It able to store 135 Billion messages a month
[11].The current message infrastructure of Facebook handles 350 million users
sending over 15 billion person to person messages per month [10]. Message has two
kinds of data: temporal, volatile data and ever- growing set of data that rarely get
accessed. To fulfill these objectives, people of Facebook thought about Cassandra but
they found it difficult pattern to reconcile for their new message infrastructure [10].
Due to this Facebook decided to work with HBase, a scalable and simple consistent
model that work over HDFS. It gives very high row level updates over large amount
of data.

Table.6. Hbase Features

Features Response
Schema Schema-less
Scalability Yes, Horizontal Scalable
Transaction No Transaction
Data Semi-Structured, Structured
NoSQL Products: IT Giants Perspectives 2131

Failure Support Automatic failure support


Reliability Yes
Languauge Used Java
Consistency Immediate Consistency

2.7. Neo4j: Neo4j is an open source, NoSQL graph database which is well known
in the area of networks and web application. High scalability, availability and better
reliability is a cause for use of Neo4j.Delete and update operations is required in order
to claim that neo4j is a realistic and strong candidate for replacing relational database
[12]. Cisco uses the commercial edition of Neo4j and it is surprised to know that out
of 2000, there are more than 20 companies using Neo4j as database [13 ]. In Neo4j, 4j
is describing about java that’s make this more robust and secure.

Table.7. Neo4jFeatures

Features Response
Data Model Flexible/Node-edges
Availability High
Data Connected/Semi-structured
Language Cypher Query language [16]
Content OpenSource
Database Type Transactional Database

Table.8. NoSQL Products and Summarized Features

NoSQL BigTable Cassandra Voldemort Neo4j Hbase Dynamo


Products
Organiz Google Facebook Linkedln CISCO Facebook Amazon
ation Messenger
Used
PURPO Google Earth, Inbox Search People you Master Data Messages Reliable shopping
SE YouTube, Gmail, May Know Manageme + email + Cart Services
Google Maps nt SMS +
Chat
Data Column-oriented Column- Key-value Graph- Column- Key-value
Model oriented Based oriented
Schema Column-family Schemaless Schemaless Schemaless Schemales Schemaless
Schema s
Availabil High High High High High High
ity
Scalabili High High High High High(Horiz High
2132 Shagufta Praveen and Dr. Umesh Chandra

ty ontal
Scalable)
Languag Top of C++ Java Java Cyber Java Java, Node.js,
e Used libraries(open source Query C# .NET, Perl, PHP
Code Languauge
Consiste Strong Consistence Consistent Tunable Strongly Immediate Sacrifice consistency
ncy Hashing Consistency Consistent Consistenc at some situation
e
Data Semi- Structured./unst Semi- Connecred/ Semi- Structured./unstructu
structured/structured/u ructured/semi- Structured, Semi- Structured, red/semi-structured
nstructured structured Structured structured Structured
Replicati Yes (3 copies are Consistent Automatically Master – Automatic Using Consistent
on created Hashing replicated and Slave ally Hashing
partitioned replication, replicated
Full graph
Replication
Transact Atomic Row update No Transaction No Transaction No Provide TRansaction
ion Transaction al Database Transactio
n

CONCLUSION

A study of the use of NoSQL products by the IT giants namely Google, Amazon,
Linked In, Facebook and CISCO shows the use of mainly six products. Using data
models, working and internal mechanisms, this paper shall find out the possible
reasons for the use. Finally, this paper shall study the prospects of six NoSQL
products and suggest which of these is the most efficient for variety purposes.

REFERENCES
[1] Lemieux, F.,Current and Emerging Trends in Cyber Operations: Policy,
Strategy and Practiceedited
[2] Praveen, S., and Chandra, U.,2017, A comparative study on NoSQL,
NewSQL and Polygot persistence,
[3] Gilbert, S., and Lynch, N.,” Brewer conjecture and the feasibility of
consistent, available and partition-tolerance web services”,ACM SIGACT
News 33, 2 pp 51-59 March 2002
[4] Gantz, J., and Reinse, D., The Digital Universe in 2020: Big Data, Bigger
Digital Shadows, and Biggest Growth in the Far East[R]. IDC iView,
Sponsored by EMC, December 2012
[5] Bigtable: A Distributed Storage System for Structured Data by Google, Inc.,
To appearin OSDI 2006
[6] DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A.,
Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W., Dynamo:
Amazon’s Highly Available Key-value Store Amazon.com, SOSP’07,
NoSQL Products: IT Giants Perspectives 2133

October 14–17, 2007, Stevenson, Washington, USA. Copyright 2007 ACM


978-1-59593-591-5/07/0010
[7] Lakshman, A., Malik , P., Cassandra – A Decentralized structured storage
system
[8] Chaganti, S. P., Voldemort NoSQL Database , Department of computer
science, University of Bridgeport,CT
[9] sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., Shah, S., large-scale
batch computed data with project voldemort by, Linkedln
[10] Big%20Table/The%20Underlying%20Technology%20of%20Messages.html
by Kannan is a software engineer at Facebook., Facebook © 2017
[11] http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-
messaging-system-hbase-to-store-135.html BY TUESDAY, NOVEMBER 16,
2010 AT 7:52AM
[12] Melchor ,F., lopez, S., Guillermo, E., de la cruz, S.,2015,Literature review
about Neo4j graph database as a feasible alternative for replacing RDBMS,
[13] BigTableNeoTechnologyexecsHow%20Neo4j%20beat%20Oracle%20Databa
se%20_%20Network%20World.html
[14] Karger, D., Lehman, E., Leighton, T., Levine, M., lewin, D., and Panigrahy,
R., Consistency Hashing and random Trees. Distributed caching protocols for
relieving hot spots on the world wide web. In ACM Symposium on theory of
computing pages 654-663,1997
[15] Available Online :[ http://www.project-voldemort.com/voldemort/]
[16] Available Online: [https://en.wikipedia.org/wiki/Cypher_Query_Language]
2134 Shagufta Praveen and Dr. Umesh Chandra

You might also like