0% found this document useful (0 votes)

81 views18 pages

BIG Data 2

Uploaded by

navata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views18 pages

BIG Data 2

Uploaded by

navata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Unit 2

1) What is NoSQL?

Summary
 NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is
easy to scale
 The concept of NoSQL databases beccame popular with Internet giants like Google,
Facebook, Amazon, etc. who deal with huge volumes of data
 In the year 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source
relational database
 NoSQL databases never follow the relational model it is either schema-free or has
relaxed schemas
 Four types of NoSQL Database are 1).Key-value Pair Based 2).Column-oriented Graph 3).
Graphs based 4).Document-oriented
 NOSQL can handle structured, semi-structured, and unstructured data with equal effect
 CAP theorem consists of three words Consistency, Availability, and Partition Tolerance
 BASE stands for Basically Available, Soft state, Eventual consistency
 The term "eventual consistency" means to have copies of data on multiple machines to
get high availability and scalability
 NOSQL offer limited query capabilities

NoSQL database is non-relational, so it scales out better than relational databases as they are
designed with web applications in mind.

Brief History of NoSQL Databases

 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational
database

[1]
 2000- Graph database Neo4j is launched

 2004- Google BigTable is launched

 2005- CouchDB is launched

 2007- The research paper on Amazon Dynamo is released

 2008- Facebooks open sources the Cassandra project

 2009- The term NoSQL was reintroduced

Features of NoSQL

Non-relational

 NoSQL databases never follow the relational model

 Never provide tables with flat fixed-column records

 Work with self-contained aggregates or BLOBs

 Doesn't require object-relational mapping and data normalization

 No complex features like query languages, query planners,Referential integrity joins,

ACID

Schema-free

 NoSQL databases are either schema-free or have relaxed schemas

 Do not require any sort of definition of the schema of the data

 Offers heterogeneous structures of data in the same domain

Simple API

 Offers easy to use interfaces for storage and querying data provided

 APIs allow low-level data manipulation & selection methods

 Text-based protocols mostly used with HTTP REST with JSON

 Mostly used no standard based NoSQL query language

 Web-enabled databases running as internet-facing services

[2]
Distributed

 Multiple NoSQL databases can be executed in a distributed fashion

 Offers auto-scaling and fail-over capabilities

 Often ACID concept can be sacrificed for scalability and throughput

 Mostly no synchronous replication between distributed nodes Asynchronous Multi-

Master Replication, peer-to-peer, HDFS Replication

 Only providing eventual consistency

 Shared Nothing Architecture. This enables less coordination and higher distribution.
2) Aggregate Data Models:

An aggregate is a collection of data that we interact with as a unit. These units of data or
aggregates form the boundaries for ACID operations with the database, Key-value, Document,
and Column-family databases can all be seen as forms of aggregate-oriented database.

Aggregates make it easier for the database to manage data storage over clusters, since the unit
of data now could reside on any machine and when retrieved from the database gets all the
related data along with it. Aggregate-oriented databases work best when most data interaction
is done with the same aggregate, for example when there is need to get an order and all its
details, it better to store order as an aggregate object but dealing with these aggregates to get
item details on all the orders is not elegant.

Aggregate-oriented databases make inter-aggregate relationships more difficult to handle than

intra-aggregate relationships. Aggregate-ignorant databases are better when interactions use
data organized in many different formations. Aggregate-oriented databases often compute
materialized views to provide data organized differently from their primary aggregates. This is
often done with map-reduce computations, such as a map-reduce job to get items sold per day.

Distribution Models:

Aggregate oriented databases make distribution of data easier, since the distribution
mechanism has to move the aggregate and not have to worry about related data, as all the
related data is contained in the aggregate. There are two styles of distributing data:

 Sharding: Sharding distributes different data across multiple servers, so each server acts

[3]
as the single source for a subset of data.

 Replication: Replication copies data across multiple servers, so each bit of data can be
found in multiple places. Replication comes in two forms,

 Master-slave replication makes one node the authoritative copy that handles
writes while slaves synchronize with the master and may handle reads.

 Peer-to-peer replication allows writes to any node; the nodes coordinate to

synchronize their copies of the data.

Master-slave replication reduces the chance of update conflicts but peer-to-peer replication
avoids loading all writes onto a single server creating a single point of failure. A system may use
either or both techniques. Like Riak database shards the data and also replicates it based on the
replication factor.

3) Types of NoSQL Databases

NoSQL Databases are mainly categorized into four types: Key-value pair, Column-oriented,
Graph-based and Document-oriented. Every category has its unique attributes and limitations.
None of the above-specified database is better to solve all the problems. Users should select
the database based on their product needs.

Types of NoSQL Databases:

 Key-value Pair Based

 Column-oriented Graph

 Graphs based

 Document-oriented

Key Value Pair Based

Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy
load.

Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.

[4]
For example, a key-value pair may contain a key like "Website" associated with a value like
"Guru99".

It is one of the most basic NoSQL database example. This kind of NoSQL database is used as a
collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.

Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all
based on Amazon's Dynamo paper.

Column-based

Column-oriented databases work on columns and are based on BigTable paper by Google.
Every column is treated separately. Values of single column databases are stored contiguously.

They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as the
data is readily available in a column.

[5]
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.

Document-Oriented:

Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is
stored as a document. The document is stored in JSON or XML formats. The value is understood
by the DB and can be queried.

In this diagram on your left you can see we have rows and columns, and in the right, we have a
document database which has a similar structure to JSON. Now for the relational database, you
have to know what columns you have and so on. However, for a document database, you have
data store like JSON object. You do not require to define which make it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time analytics & e-
commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular Document
originated DBMS systems.

Graph-Based

A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.

[6]
Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the
DB, and there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data.

Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases.

Query Mechanism tools for NoSQL

Sharding distributes data between nodes

 The goal is for users to get all, or most of, their data from one server
 Many NoSQL databases perform automatic sharding
 Sharding can improve both read and write performance
 Sharding allows horizontal scaling for both reads and writes
 However sharding does not improve resilience
 Since sharding distributes data across many machines there is a larger chance of failure
 Particularly compared to a single machine that is highly maintained
 Locate the Vancouver accounts in Vancouver servers

[7]
 Locate aggregates that are likely to be accessed together or in sequence in the same
location
What is the CAP Theorem?

CAP theorem is also called brewer's theorem. It states that is impossible for a distributed data
store to offer more than two out of three guarantees

1. Consistency

2. Availability

3. Partition Tolerance

Consistency:

The data should remain consistent even after the execution of an operation. This means once
data is written, any future read request should contain that data. For example, after updating
the order status, all the clients should be able to see the same data.

Availability:

The database should always be available and responsive. It should not have any downtime.

Partition Tolerance:

Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be partitioned
into multiple groups which may not communicate with each other. Here, if part of the database
is unavailable, other parts are always unaffected.

Eventual Consistency

The term "eventual consistency" means to have copies of data on multiple machines to get high
availability and scalability. Thus, changes made to any data item on one machine has to be
propagated to other replicas.

Data replication may not be instantaneous as some copies will be updated immediately while
others in due course of time. These copies may be mutually, but in due course of time, they
become consistent. Hence, the name eventual consistency.

BASE: Basically Available, Soft state, Eventual consistency

 Basically, available means DB is available all the time as per CAP theorem

[8]
 Soft state means even without an input; the system state may change

 Eventual consistency means that the system will become consistent over time

Advantages of NoSQL

 Can be used as Primary or Analytic Data Source

 Big Data Capability

 No Single Point of Failure

 Easy Replication

 No Need for Separate Caching Layer

 It provides fast performance and horizontal scalability.

 Can handle structured, semi-structured, and unstructured data with equal effect

 Object-oriented programming which is easy to use and flexible

 NoSQL databases don't need a dedicated high-performance server

 Support Key Developer Languages and Platforms

 Simple to implement than using RDBMS

 It can serve as the primary data source for online applications.

 Handles big data which manages data velocity, variety, volume, and complexity

[9]
 Excels at distributed database and multi-data center operations

 Eliminates the need for a specific caching layer to store data

 Offers a flexible schema design which can easily be altered without downtime or service
disruption

Disadvantages of NoSQL

 No standardization rules

 Limited query capabilities

 RDBMS databases and tools are comparatively mature

 It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.

 When the volume of data increases it is difficult to maintain unique values as keys
become difficult

 Doesn't work as well with relational data

 The learning curve is stiff for new developers

 Open source options so not so popular for enterprises.

Relation databases and NoSQL databases in a tabular format.

Relational Database NoSQL Database

Handles data coming in low Handles data coming in high velocity

velocity

Data arrive from one or few Data arrive from many locations
locations

Manages structured data Manages structured unstructured and

semi-structured data.

[10]
Supports complex transactions Supports simple transactions
(with joins)

single point of failure with No single point of failure

failover

Handles data in the moderate Handles data in very high volume

volume.

Centralized deployments Decentralized deployments

Transactions written in one Transaction written in many locations

location

Gives read scalability Gives both read and write scalability

Deployed in vertical fashion Deployed in Horizontal fashion

Key highlights on SQL vs NoSQL:

SQL NoSQL

RELATIONAL DATABASE MANAGEMENT

SYSTEM (RDBMS) Non-relational or distributed database system.

These databases have fixed or static or

predefined schema They have dynamic schema

[11]
SQL NoSQL

These databases are not suited for These databases are best suited for
hierarchical data storage. hierarchical data storage.

These databases are best suited for complex These databases are not so good for complex
queries queries

Vertically Scalable Horizontally scalable

Follows CAP(consistency, availability, partition

Follows ACID property tolerance)

What is Apache Cassandra?

Cassandra is a distributed database management system designed for handling a high volume
of structured data across commodity servers
Cassandra handles the huge amount of data with its distributed architecture. Data is placed on
different machines with more than one replication factor that provides high availability and no
single point of failure.
In the image below, circles are Cassandra nodes and lines between the circles shows distributed
architecture, while the client is sending data to the node.

[12]
Apache Cassandra Features
There are following features that Cassandra provides.
 Massively Scalable Architecture: Cassandra has a masterless design where all nodes are
at the same level which provides operational simplicity and easy scale out.
 Masterless Architecture: Data can be written and read on any node.
 Linear Scale Performance: As more nodes are added, the performance of Cassandra
increases.
 No Single point of failure: Cassandra replicates data on different nodes that ensures no
single point of failure.
 Fault Detection and Recovery: Failed nodes can easily be restored and recovered.
 Flexible and Dynamic Data Model: Supports datatypes with Fast writes and reads.
 Data Protection: Data is protected with commit log design and build in security like
backup and restore mechanisms.
 Tunable Data Consistency: Support for strong data consistency across distributed
architecture.
 Multi Data Center Replication: Cassandra provides feature to replicate data across
multiple data center.
 Data Compression: Cassandra can compress up to 80% data without any overhead.
 Cassandra Query language: Cassandra provides query language that is similar like SQL
language. It makes very easy for relational database developers moving from relational
database to Cassandra.

Cassandra Table: Create, Alter, Drop & Truncate (with Example)

The syntax of Cassandra query language (CQL) resembles with SQL language.
 Create Table
 Alter Table
 Drop Table
 Truncate Table
How to Create Table
Column family in Cassandra is similar to RDBMS table. Column family is used to store data.
Command 'Create Table' is used to create column family in Cassandra.
Syntax
Create table KeyspaceName.TableName
(
ColumnName DataType,

[13]
…
Primary key(ColumnName)
) with PropertyName=PropertyValue;

Cassandra Alter table

Command 'Alter Table' is used to drop column, add a new column, alter column name, alter
column type and change the property of the table.
Syntax
Following is the syntax of command 'Alter Table.'
Alter table KeyspaceName.TableName +
Alter ColumnName TYPE ColumnDataype |
Add ColumnName ColumnDataType |
Drop ColumnName |
Rename ColumnName To NewColumnName |
With propertyName=PropertyValue
Example
Here is the snapshot of the command 'Alter Table' that will add new column in the table
Student.

Drop Table
Command 'Drop table' drops specified table including all the data from the keyspace. Before
dropping the table, Cassandra takes a snapshot of the data not the schema as a backup.
Syntax
Drop Table KeyspaceName.TableName
Example
Here is the snapshot of the executed command 'Drop Table' that will drop table Student from
the keyspace 'University'.

[14]
After successful execution of the command 'Drop Table', table Student will be dropped from
the keyspace University.

Truncate Table
Command 'Truncate table' removes all the data from the specified table. Before truncating the
data, Cassandra takes the snapshot of the data as a backup.
Syntax
Truncate KeyspaceName.TableName
Example
Here is the snapshot of the executed command 'Truncate table' that will remove all the data
from the table Student.

After successful execution of the command 'Truncate Table', all the data will be removed from
the table Student.

Cassandra Query Language(CQL): Insert Into, Update (Example)

In this article, you will learn Cassandra commands with CQL examples-
 Insert Data
 Upsert Data
 Update Data
 Delete Data
 Cassandra Where Clause

[15]
Insert Data
The Cassandra insert statement writes data in Cassandra columns in row form. Cassandra insert
query will store only those columns that are given by the user. You have to necessarily specify
just the primary key column.
It will not take any space for not given values. No results are returned after insertion.
Syntax
Insert into KeyspaceName.TableName(ColumnName1, ColumnName2, ColumnName3 . . . .)
values (Column1Value, Column2Value, Column3Value . . . .)
Example
Here is the snapshot of the executed Cassandra Insert into table query that will insert one
record in Cassandra table 'Student'.

Insert into University.Student(RollNo,Name,dept,Semester) values(2,'Michael','CS', 2);

After successful execution of the command Insert into Cassandra, one row will be inserted in
the Cassandra table Student with RollNo 2, Name Michael, dept CS and Semester 2.

Upsert Data
Cassandra does upsert. Upsert means that Cassandra will insert a row if a primary key does not
exist already otherwise if primary key already exists, it will update that row.
Update Data
The Cassandra Update query is used to update the data in the Cassandra table. If no results are
returned after updating data, it means data is successfully updated otherwise an error will be
returned. Column values are changed in 'Set' clause while data is filtered with 'Where' clause.
Syntax
Update KeyspaceName.TableName
Set ColumnName1=new Column1Value,
… .
Where ColumnName=ColumnValue
Example

[16]
Here is the snapshot of the executed Cassandra Update command that updates the record in
the Student table.

Update University.Student
Set name='Hayden'
Where rollno=1;
After successful execution of the update query in Cassandra 'Update Student', student name
will be changed from 'Clark' to 'Hayden' that has rollno 1.

Cassandra Delete Data

Command 'Delete' removes an entire row or some columns from the table Student. When data
is deleted, it is not deleted from the table immediately. Instead deleted data is marked with a
tombstone and are removed after compaction.
Syntax
Delete from KeyspaceName.TableName
Where ColumnName1=ColumnValue
The above Cassandra delete row syntax will delete one or more rows depend upon data
filtration in where clause.
Delete ColumnNames from KeyspaceName.TableName
Where ColumnName1=ColumnValue
The above syntax will delete some columns from the table.
Example
Here is the snapshot of the command that will remove one row from the table Student.

[17]
Delete from University.Student where rollno=1;
After successful execution of the CQL Delete command, one rows will be deleted from the table
Student where rollno value is 1.

What Cassandra does not support

There are following limitations in Cassandra query language (CQL).
1. CQL does not support aggregation queries like max, min, avg
2. CQL does not support group by, having queries.
3. CQL does not support joins.
4. CQL does not support OR queries.
5. CQL does not support wildcard queries.
6. CQL does not support Union, Intersection queries.
7. Table columns cannot be filtered without creating the index.
8. Greater than (>) and less than (<) query is only supported on clustering column.
Cassandra query language is not suitable for analytics purposes because it has so many
limitations.
Cassandra Where Clause
In Cassandra, data retrieval is a sensitive issue. The column is filtered in Cassandra by creating
an index on non-primary key columns.
Syntax
Select ColumnNames from KeyspaceName.TableName Where ColumnName1=Column1Value
AND
ColumnName2=Column2Value AND
..
Example

select * from University.Student;

Two records are retrieved from Student table.
 Here is the snapshot that shows the data retrieval from Student with data filtration. One
record is retrieved.
Data is filtered by name column. All the records are retrieved that has name equal to Guru99.

select * from University.Student where name='Guru99';

[18]

11 em Acc Public MLM
No ratings yet
11 em Acc Public MLM
11 pages
Sizing of Amine Absorber
No ratings yet
Sizing of Amine Absorber
7 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Unit II No-SQL DB Managment
No ratings yet
Unit II No-SQL DB Managment
33 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
Features of Nosql: Non-Relational
No ratings yet
Features of Nosql: Non-Relational
7 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
Unit 6
No ratings yet
Unit 6
143 pages
No SQL
No ratings yet
No SQL
12 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Full Stack-Unit-Iii
No ratings yet
Full Stack-Unit-Iii
56 pages
NOSQL
No ratings yet
NOSQL
25 pages
Chapter 1 - Introducing Big Data & NoSQL
No ratings yet
Chapter 1 - Introducing Big Data & NoSQL
14 pages
What Is NoSQL
No ratings yet
What Is NoSQL
4 pages
NOSQL
No ratings yet
NOSQL
15 pages
NoSQL Tutorial - New
No ratings yet
NoSQL Tutorial - New
10 pages
Module 3 Bigdata Analytics
No ratings yet
Module 3 Bigdata Analytics
19 pages
Unit 2
No ratings yet
Unit 2
65 pages
U5 Final
No ratings yet
U5 Final
45 pages
BDT Unit-Ii
No ratings yet
BDT Unit-Ii
13 pages
Unit 3 Nosql Databases Adt
No ratings yet
Unit 3 Nosql Databases Adt
64 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
No SQL Database Compiled
No ratings yet
No SQL Database Compiled
20 pages
Unit II - BIG DATA ANALYTICS
No ratings yet
Unit II - BIG DATA ANALYTICS
11 pages
Big Data Unit 3
No ratings yet
Big Data Unit 3
374 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
No SQL
No ratings yet
No SQL
3 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Module 1 Introduction
No ratings yet
Module 1 Introduction
9 pages
Lec 15 Notes
No ratings yet
Lec 15 Notes
3 pages
Nosql Database: Abstract
No ratings yet
Nosql Database: Abstract
6 pages
01 NSQL
No ratings yet
01 NSQL
5 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Unit Iii
No ratings yet
Unit Iii
22 pages
Ca23301-Full Stack Web Development Unit-III
No ratings yet
Ca23301-Full Stack Web Development Unit-III
61 pages
Unit 2
No ratings yet
Unit 2
23 pages
Non Relational Database-NoSQL
No ratings yet
Non Relational Database-NoSQL
4 pages
Bda Unit-2
No ratings yet
Bda Unit-2
29 pages
Introduction To Nosql: What Is A Nosql Database Used For?
No ratings yet
Introduction To Nosql: What Is A Nosql Database Used For?
6 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
Unit 3
No ratings yet
Unit 3
10 pages
Unit II - BDA NEW
No ratings yet
Unit II - BDA NEW
48 pages
NoSQL Complete QB
No ratings yet
NoSQL Complete QB
43 pages
Unit VI Big Data
No ratings yet
Unit VI Big Data
19 pages
NOSQL
No ratings yet
NOSQL
50 pages
Unit 2
No ratings yet
Unit 2
26 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
28 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
No SQL
No ratings yet
No SQL
9 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
No SQL
No ratings yet
No SQL
11 pages
CH.5 NOSQL Database For Business Applications
No ratings yet
CH.5 NOSQL Database For Business Applications
21 pages
Bda Unit12
No ratings yet
Bda Unit12
9 pages
MongoDB Slides Until ClassTest
No ratings yet
MongoDB Slides Until ClassTest
221 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
NoSQL Databases
No ratings yet
NoSQL Databases
10 pages
Unit 4 Mfcs
No ratings yet
Unit 4 Mfcs
27 pages
Unit4 2
No ratings yet
Unit4 2
21 pages
Unit 1 (Big Data Analytics)
No ratings yet
Unit 1 (Big Data Analytics)
11 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
Deep Learning Ascs
No ratings yet
Deep Learning Ascs
10 pages
ELCB F
No ratings yet
ELCB F
5 pages
Email Invoicing (E-Invoicing) : A Tool For Customer Satisfaction and Logistics Optimization
No ratings yet
Email Invoicing (E-Invoicing) : A Tool For Customer Satisfaction and Logistics Optimization
3 pages
19 Arid 3235 LAB (5,6,7,8)
No ratings yet
19 Arid 3235 LAB (5,6,7,8)
11 pages
1.4 Process Models
No ratings yet
1.4 Process Models
40 pages
Hardware of The PIC16F877
No ratings yet
Hardware of The PIC16F877
2 pages
Whitepaper Top Benefits of Video Conferencing Polycom
No ratings yet
Whitepaper Top Benefits of Video Conferencing Polycom
2 pages
SAP BASIS Transaction Codes User Administration
No ratings yet
SAP BASIS Transaction Codes User Administration
3 pages
Chapter 1 - Shining Resonance Refrain Walkthrough - Neoseeker
No ratings yet
Chapter 1 - Shining Resonance Refrain Walkthrough - Neoseeker
6 pages
General Notes: Bridge Site Location Plan
No ratings yet
General Notes: Bridge Site Location Plan
1 page
Forms of Quadratic Function
No ratings yet
Forms of Quadratic Function
2 pages
(Electrical Power Systems) (By: C.L. Wadhwa) (Published: July, 2009)
No ratings yet
(Electrical Power Systems) (By: C.L. Wadhwa) (Published: July, 2009)
5 pages
HW2 111306048
No ratings yet
HW2 111306048
4 pages
Touch Screen Technology: Let'S Touch The Future
No ratings yet
Touch Screen Technology: Let'S Touch The Future
45 pages
Information Systems Today: Chapter # 5
No ratings yet
Information Systems Today: Chapter # 5
32 pages
Template For GigaByte Journal Data Report Submissions
No ratings yet
Template For GigaByte Journal Data Report Submissions
10 pages
AGRU - FM 1613 Approved HDPE Pipes Fittings
No ratings yet
AGRU - FM 1613 Approved HDPE Pipes Fittings
64 pages
Welcome To Transport Department Government of Tel 3
No ratings yet
Welcome To Transport Department Government of Tel 3
1 page
Minutes of Meeting Held Between M/S Ultra Tech Sewagram Cements LTD and M/S S.N Enviro Solutions PVT LTD
No ratings yet
Minutes of Meeting Held Between M/S Ultra Tech Sewagram Cements LTD and M/S S.N Enviro Solutions PVT LTD
1 page
Continuously Reinforced Concrete Pavement
100% (1)
Continuously Reinforced Concrete Pavement
2 pages
Modern Teaching Methods
75% (4)
Modern Teaching Methods
10 pages
Chapter 3 Test Taha
No ratings yet
Chapter 3 Test Taha
3 pages
Manuel #1116649 (FM841, FM840) Rig 301-52
No ratings yet
Manuel #1116649 (FM841, FM840) Rig 301-52
101 pages
Double Skin Façade and Potential Integration With Other Building Environmental Technologies and Materials
No ratings yet
Double Skin Façade and Potential Integration With Other Building Environmental Technologies and Materials
8 pages
Teamlease Services Limited: Earnings Rs. Deduction Rs
No ratings yet
Teamlease Services Limited: Earnings Rs. Deduction Rs
1 page
Inspection Notification-093.Rev A
No ratings yet
Inspection Notification-093.Rev A
2 pages
Application - Generator Protection
No ratings yet
Application - Generator Protection
13 pages
Senior Medical Scribe
No ratings yet
Senior Medical Scribe
4 pages
Recloser-Fuse Coordination of Radial Distribution Systems in Presence of DG: Analysis, Simulation Studies, & An Adaptive Relaying Scheme
No ratings yet
Recloser-Fuse Coordination of Radial Distribution Systems in Presence of DG: Analysis, Simulation Studies, & An Adaptive Relaying Scheme
31 pages

BIG Data 2

Uploaded by

BIG Data 2

Uploaded by

Unit 2

Brief History of NoSQL Databases

 2004- Google BigTable is launched

 2005- CouchDB is launched

 2007- The research paper on Amazon Dynamo is released

 2008- Facebooks open sources the Cassandra project

 2009- The term NoSQL was reintroduced

 NoSQL databases never follow the relational model

 Never provide tables with flat fixed-column records

 Work with self-contained aggregates or BLOBs

 Doesn't require object-relational mapping and data normalization

 No complex features like query languages, query planners,Referential integrity joins,

 NoSQL databases are either schema-free or have relaxed schemas

 Do not require any sort of definition of the schema of the data

 Offers heterogeneous structures of data in the same domain

 APIs allow low-level data manipulation & selection methods

 Text-based protocols mostly used with HTTP REST with JSON

 Mostly used no standard based NoSQL query language

 Web-enabled databases running as internet-facing services

 Multiple NoSQL databases can be executed in a distributed fashion

 Offers auto-scaling and fail-over capabilities

 Often ACID concept can be sacrificed for scalability and throughput

 Mostly no synchronous replication between distributed nodes Asynchronous Multi-

 Only providing eventual consistency

Aggregate-oriented databases make inter-aggregate relationships more difficult to handle than

 Peer-to-peer replication allows writes to any node; the nodes coordinate to

3) Types of NoSQL Databases

Types of NoSQL Databases:

 Key-value Pair Based

Key Value Pair Based

Query Mechanism tools for NoSQL

Sharding distributes data between nodes

BASE: Basically Available, Soft state, Eventual consistency

 Can be used as Primary or Analytic Data Source

 Big Data Capability

 No Single Point of Failure

 No Need for Separate Caching Layer

 It provides fast performance and horizontal scalability.

 Object-oriented programming which is easy to use and flexible

 NoSQL databases don't need a dedicated high-performance server

 Support Key Developer Languages and Platforms

 Simple to implement than using RDBMS

 It can serve as the primary data source for online applications.

 Eliminates the need for a specific caching layer to store data

 Limited query capabilities

 RDBMS databases and tools are comparatively mature

 Doesn't work as well with relational data

 The learning curve is stiff for new developers

 Open source options so not so popular for enterprises.

Relation databases and NoSQL databases in a tabular format.

Relational Database NoSQL Database

Handles data coming in low Handles data coming in high velocity

Manages structured data Manages structured unstructured and

single point of failure with No single point of failure

Handles data in the moderate Handles data in very high volume

Centralized deployments Decentralized deployments

Transactions written in one Transaction written in many locations

Gives read scalability Gives both read and write scalability

Deployed in vertical fashion Deployed in Horizontal fashion

Key highlights on SQL vs NoSQL:

RELATIONAL DATABASE MANAGEMENT

These databases have fixed or static or

Vertically Scalable Horizontally scalable

Follows CAP(consistency, availability, partition

What is Apache Cassandra?

Cassandra Table: Create, Alter, Drop & Truncate (with Example)

Cassandra Alter table

Cassandra Query Language(CQL): Insert Into, Update (Example)

Insert into University.Student(RollNo,Name,dept,Semester) values(2,'Michael','CS', 2);

Cassandra Delete Data

What Cassandra does not support

select * from University.Student;

select * from University.Student where name='Guru99';

You might also like