0% found this document useful (0 votes)

76 views40 pages

Fdocuments - in Nosql-Seminar

NOSQL databases provide an alternative to traditional relational databases by offering more flexible schemas, horizontal scaling, and higher performance. The document discusses several types of NOSQL databases including key-value stores, document stores, graph databases, and map-reduce frameworks. Popular NOSQL databases like MongoDB, Cassandra, HBase, CouchDB, and Redis are highlighted along with their uses at companies like Facebook, Twitter, and Google.

Uploaded by

Irfan Pinjari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views40 pages

Fdocuments - in Nosql-Seminar

Uploaded by

Irfan Pinjari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

NOSQL

Agenda
Introduction to NOSQL
Objective
Examples of NOSQL databases
NOSQL vs SQL
Conclusion
Basic Concepts

Database – is a organized collection of data.

Data base Management System (DBMS)- is a software
package with computer program that controls the creation
, maintainance & use of a database.
 for DBMS , we use structured language to interact with it
 Ex. Oracle , IBM DB2 , Ms Access , MySQL , FoxPro etc.
Relational DBMS - A relational database is a collection
of data items organized as a set of formally described
tables from which data can be accessed easily. A relational
database is created using the relational model. The
software used in a relational database is called a
relational database management system (RDBMS).
SQL

Stuctured Query Language

Special purpose programming language designed for managing
data in RDBMS.
Origininally based upon relational algebra & tuple relation calculas.
SQl’s scope include data insert,upadte & delete, schema creation
and modification , data access control.
It is static and strong used in database.
Most used widely used database language.
Query is the most important operation in SQL.
Ex. SELECT *
FROM Book
WHERE price > 100.00
ORDER BY title;
NOSQL

Stands for Not Only SQL

Class of non-relational data storage systems
Usually do not require a fixed table schema nor do
they use the concept of joins
All NOSQL offerings relax one or more of the ACID
properties .
 Atomicity , Consistancy , Isolation , Durability ( ACID )
“NOSQL” = “Not Only SQL” =
Not Only using traditional relational DBMS
NOSQL

• Alternative to traditional relational DBMS

• Flexible schema
• Quicker/cheaper to set up
• Massive scalability
• Relaxed consistency  higher performance &
availability

* No declarative query language  more programming

* Relaxed consistency  fewer guarantees
Why NOSQL?

Every problem cannot be solved by traditional

relational database system exclusively.
Handles huge databases.
Redundancy, data is pretty safe on commodity
hardware
Super flexible queries using map/reduce
Rapid development (no fixed schema, yeah!)
Very fast for common use cases
Contd..

Inspired by Distributed Data Storage problems

Scale easily by adding servers
Not suited to all problem types, but super-suited to
certain large problem types
High-write situations (eg activity tracking or
timeline rendering for millions of users)
A lot of relational uses are really dumbed down (eg
fetch by PK with update)
Architecture
How does it work?

Clients know how to:

Send items to servers (consistent hashing)
What to do when a server fails
How to fetch keys from servers
Can “weigh” to server capacities

Servers know how to:

Store items they receive
Expire them from the cache
No inter-server comms – everything is unaware
Performance

RDBMS uses buffer to ensure ACID properties

NoSQL does not guarantee ACID and is therefore
much faster
We don’t need ACID everywhere!
Ex. Data processing (every minute) is 4x faster with
MongoDB, despite being a lot more detailed (due to
much simple development)
Why NOSQL is faster than SQL ? - Scalling

Simple web application with not much traffic

 Application server, database server all on one machine
Scalling contd..

More traffic comes in

 Application server
 Database server

Even more traffic comes in

 Load balancer
 Application server x2
 Database server
Scalling contd..

 Even more traffic comes in

 Load balancer x N
 easy
 Application server x N
 easy
 Database server xN
 hard for SQL databases
SQL Slowdown

Not linear!
Scalling contd..

NoSQL Scalling -
Need more storage?
 Add more servers!
Need higher performance?
 Add more servers!
Need better reliability?
 Add more servers!
Scalling Summary

You can scale SQL databases (Oracle, MySQL, SQL

Server…)
 This will cost you dearly
 If you don’t have a lot of money, you will reach limits quickly
You can scale NoSQL databases
 Very easy horizontal scaling
 Lots of open-source solutions
 Scaling is one of the basic incentives for design, so it is well
handled
 Scaling is the cause of trade-offs causing you to have to use
map/reduce
Characterstics

Almost infinite horizontal scaling

Very fast
Performance doesn’t deteriorate with growth (much)
No fixed table schemas
No join operations
Ad-hoc queries difficult or impossible
Structured storage
Almost everything happens in RAM
NOSQL Types

Wide Column Store / Column Families

Document Store
Key Value / Tuple Store
Graph Databases
Object Databases
XML Databases
Multivalue Databases
Main types -

Key-Value Stores
Map Reduce Framework
Document Databases
Graph Databases
Key Value Stores

Lineage: Amazon's Dynamo paper and Distributed

HashTables.
Data model: A global collection of key-value pairs
Example systems
 Google BigTable , Amazon Dynamo, Cassandra,
Voldemort , Hbase , …
Implementation: efficiency, scalability, fault-tolerance
Records distributed to nodes based on key
 Replication
 Single-record transactions, “eventual consistency”
Documented Databases

Lineage: Inspired by Lotus Notes.

Data model: Collections of documents, which
contain key-value collections (called "documents").
Example: CouchDB, MongoDB, Riak
Graph Database

Lineage: Draws from Euler and graph theory.

Data model: Nodes & relationships, both which can
hold key-value pairs
Example: AllegroGraph, InfoGrid, Neo4j
Map Reduce Framework

Google’s framework for processing highly

distributable problems across huge datasets
using a large number of computers
Let’s define large number of computers
 Cluster if all of them have same hardware
 Grid unless Cluster (if !Cluster for old-style programmers)
Process split into two phases
 Map
 Take the input, partition it delegate to other machines
 Other machines can repeat the process, leading to tree structure
 Each machine returns results to the machine who gave it the task
Map Reduce Framework contd..

 Reduce
 collect results from machines you gave the tasks
 combine results and return it to requester
 Slower than sequential data processing, but massively parallel
 Sort petabyte of data in a few hours
 Input, Map, Shuffle, Reduce, Output
Popular NoSQL

Hadoop / Hbase MemcacheDB

Cassandra Voldemort
Amazon Hypertable
SimpleDB Cloudata
MongoDB IBM
CouchDB Lotus/Domino
Redis
Real World Use

Cassandra
 Facebook (original developer, used it till late 2010)
 Twitter
 Digg
 Reddit
 Rackspace
 Cisco
BigTable
 Google (open-source version is HBase)
MongoDB
 Foursquare
 Craigslist
 Bit.ly
 SourceForge
 GitHub
MONGODB

 Document store
 Basic support for dynamic (ad hoc) queries
 Query by example (nice!)

 Conditional Operators
 <, <=, >, >=
 $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $and,
$size, $type
MONGODB

Data is stored as BSON (binary JSON)

 Makes it very well suited for languages with native JSON support
Map/Reduce written in Javascript
 Slow! There is one single thread of execution in Javascript
Master/slave replication (auto failover with replica sets)
Sharding built-in
Uses memory mapped files for data storage
Performance over features
On 32bit systems, limited to ~2.5Gb
An empty database takes up 192Mb
GridFS to store big data + metadata (not actually an FS)
CASANDRA

Written in: Java

Protocol: Custom, binary (Thrift)
Tunable trade-offs for distribution and replication
(N, R, W)
Querying by column, range of keys
BigTable-like features: columns, column families
Writes are much faster than reads (!)
 Constant write time regardless of database size
Map/reduce possible with Apache Hadoop
Some more info about Cassndra in Facebook

Cassandra is open source DBMS from Appache software

foundation.
Cassandra provides a structured key-value store with
tunable consistency
Cassandra is a distributed storage system for managing
structured data that is designed to scale to a very large
size across many commodity servers, with no single
point of failure
It is a NoSQL solution that was initially developed by
Facebook and powered their Inbox Search feature until
late 2010
HBASE

Written in: Java

Main point: Billions of rows X millions of columns
Modeled after BigTable
Map/reduce with Hadoop
Query predicate push down via server side scan and get filters
Optimizations for real time queries
A high performance Thrift gateway
HTTP supports XML, Protobuf, and binary
Cascading, hive, and pig source and sink modules
No single point of failure
While Hadoop streams data efficiently, it has overhead for starting
map/reduce jobs. HBase is column oriented key/value store and allows
for low latency read and writes.
Random access performance is like MySQL
COUCHDB

Written in: Erlang
Main point: DB consistency, ease of use
Bi-directional (!) replication, continuous or ad-hoc, with conflict detection,
thus, master-master replication. (!)
MVCC - write operations do not block reads
Previous versions of documents are available
Crash-only (reliable) design
Needs compacting from time to time
Views: embedded map/reduce
Formatting views: lists & shows
Server-side document validation possible
Authentication possible
Real-time updates via _changes (!)
Attachment handling
CouchApps (standalone JS apps)
HADOOP

Apache project
A framework that allows for the distributed processing of large
data sets across clusters of computers
Designed to scale up from single servers to thousands of machines
Designed to detect and handle failures at the application layer,
instead of relying on hardware for it
Created by Doug Cutting, who named it after his son's toy elephant
Hadoop subprojects
 Cassandra
 HBase
 Pig
Hive was a Hadoop subproject, but is now a top-level Apache project
HADOOP contd..

 Scales to hundreds or thousands of computers, each with several

processor cores
 Designed to efficiently distribute large amounts of work across a
set of machines
 Hundreds of gigabytes of data constitute the low end of Hadoop-
scale
 Built to process "web-scale" data on the order of hundreds of
gigabytes to terabytes or petabytes
 Uses Java, but allows streaming so other languages can easily
send and accept data items to/from Hadoop
HADOOP contd..

Uses distributed file system (HDFS)

 Designed to hold very large amounts of data (terabytes or even
petabytes)
 Files are stored in a redundant fashion across multiple
machines to ensure their durability to failure and high
availability to very parallel applications
 Data organized into directories and files
 Files are divided into block (64MB by default) and distributed
across nodes
Design of HDFS is based on the design of the Google
File System
HIVE

A petabyte-scale data warehouse system for Hadoop

Easy data summarization, ad-hoc queries
Query the data using a SQL-like language called
HiveQL
Hive compiler generates map-reduce jobs for most
queries
Conclusion

NoSQL is a great problem solver if you need it

Choose your NoSQL platform carefully as each is
designed for specific purpose
Get used to Map/Reduce
It’s not a sin to use NoSQL alongside (yes)SQL
database
Referance

http://
www.facebook.com/note.php?note_id=24413138919
http://en.wikipedia.org/wiki/Apache_Cassandra
http://en.wikipedia.org/wiki/SQL
http://en.wikipedia.org/wiki/NoSQL
www.slideshare.com
THANK
YOU..!!

Cassandra PPT Final
No ratings yet
Cassandra PPT Final
23 pages
No-Sql: Introduction To NOSQL Objective Examples of NOSQL Databases Nosql Vs SQL Conclusion
No ratings yet
No-Sql: Introduction To NOSQL Objective Examples of NOSQL Databases Nosql Vs SQL Conclusion
13 pages
Unit 6
No ratings yet
Unit 6
143 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
NO SQL Unit 1
No ratings yet
NO SQL Unit 1
66 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
No ratings yet
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
30 pages
NOSQL Data Management
No ratings yet
NOSQL Data Management
21 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
NoSQL Notes
No ratings yet
NoSQL Notes
5 pages
NoSQL Database Revolution
50% (2)
NoSQL Database Revolution
5 pages
4.1 Intro Nosql
No ratings yet
4.1 Intro Nosql
43 pages
Big Data
No ratings yet
Big Data
53 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
No SQL
No ratings yet
No SQL
109 pages
Apache Cassandra
No ratings yet
Apache Cassandra
3 pages
Unit 2
No ratings yet
Unit 2
23 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
5.1 Intro Nosql
No ratings yet
5.1 Intro Nosql
22 pages
Cassandra Tutorial For Beginners: Learn in 3 Days: What Is Apache Cassandra?
No ratings yet
Cassandra Tutorial For Beginners: Learn in 3 Days: What Is Apache Cassandra?
4 pages
Unit 2 Bda Bda
No ratings yet
Unit 2 Bda Bda
29 pages
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
No ratings yet
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
17 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
APIGEE Edge For Private Cloud 4.51.00
No ratings yet
APIGEE Edge For Private Cloud 4.51.00
983 pages
No SQL & RDBMS
No ratings yet
No SQL & RDBMS
39 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
NOSQL
No ratings yet
NOSQL
25 pages
Cassandra Unit 4
No ratings yet
Cassandra Unit 4
18 pages
Apache Cassandra: Database
No ratings yet
Apache Cassandra: Database
55 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
57 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Module 5
No ratings yet
Module 5
31 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
NoSQL
No ratings yet
NoSQL
18 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Virtual Nodes Strategies For Apache Cassandra
100% (1)
Virtual Nodes Strategies For Apache Cassandra
5 pages
No SQL
No ratings yet
No SQL
12 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
Unit 3 - Bda
No ratings yet
Unit 3 - Bda
36 pages
Adbms Unit 1
No ratings yet
Adbms Unit 1
32 pages
Seminar and Progress Report: A Comparison Between SQL (Conventional) & Nosql (Webscale) Databases Using Various Scenarios
No ratings yet
Seminar and Progress Report: A Comparison Between SQL (Conventional) & Nosql (Webscale) Databases Using Various Scenarios
31 pages
NOSQL
No ratings yet
NOSQL
6 pages
BDA Unit-3
No ratings yet
BDA Unit-3
13 pages
Versa SDWAN Design Guide V1.2
No ratings yet
Versa SDWAN Design Guide V1.2
143 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
Reference Guide - P.1: True - False AND Durable - Writes True - False Keys USING Class - Name With Options Map
No ratings yet
Reference Guide - P.1: True - False AND Durable - Writes True - False Keys USING Class - Name With Options Map
7 pages
Web Services and Applications Deployment Guide - Genesys
No ratings yet
Web Services and Applications Deployment Guide - Genesys
30 pages
WP 7 Reasons Cache
No ratings yet
WP 7 Reasons Cache
10 pages
Scalability Availability Stability:, & Patterns
No ratings yet
Scalability Availability Stability:, & Patterns
197 pages
Unit-I Remaining HM
No ratings yet
Unit-I Remaining HM
32 pages
Consumer Life Cycle and Profiling A Data Mining Pe
No ratings yet
Consumer Life Cycle and Profiling A Data Mining Pe
16 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
Nosql Databases Types
No ratings yet
Nosql Databases Types
29 pages
Distributed Data Model
No ratings yet
Distributed Data Model
11 pages
1842 Week6 NoSQL
No ratings yet
1842 Week6 NoSQL
51 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
Data Science
No ratings yet
Data Science
108 pages
RAG With Knowledge Graph (Neo4j) - Guide On Nosql Database
No ratings yet
RAG With Knowledge Graph (Neo4j) - Guide On Nosql Database
9 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
Quastor System Design Book - NeetCode Newsletter
No ratings yet
Quastor System Design Book - NeetCode Newsletter
523 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
PR 5 - No SQL
No ratings yet
PR 5 - No SQL
9 pages
Nosql
No ratings yet
Nosql
64 pages
Assignment 4 Rdbms
No ratings yet
Assignment 4 Rdbms
18 pages
Facebook Cassandra
No ratings yet
Facebook Cassandra
10 pages
Unit 5 NOSQL
No ratings yet
Unit 5 NOSQL
102 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
BigData - BCom Unit 2
No ratings yet
BigData - BCom Unit 2
10 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Advabced Database Technology Lab Record-1
No ratings yet
Advabced Database Technology Lab Record-1
45 pages
DTC Important Questions
No ratings yet
DTC Important Questions
10 pages
Cassandra Data Model
No ratings yet
Cassandra Data Model
17 pages
NoSQL DBs
No ratings yet
NoSQL DBs
46 pages
NoSQL Lec
No ratings yet
NoSQL Lec
45 pages
4.1 Intro Nosql-Converted-133751863122661863
No ratings yet
4.1 Intro Nosql-Converted-133751863122661863
43 pages
Overview of NoSQL
No ratings yet
Overview of NoSQL
17 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
What Is Nosql Nodesc
No ratings yet
What Is Nosql Nodesc
17 pages
Unit II - BIG DATA ANALYTICS
No ratings yet
Unit II - BIG DATA ANALYTICS
11 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Unit II
No ratings yet
Unit II
31 pages

Fdocuments - in Nosql-Seminar

Uploaded by

Fdocuments - in Nosql-Seminar

Uploaded by

NOSQL

Database – is a organized collection of data.

Stuctured Query Language

Stands for Not Only SQL

• Alternative to traditional relational DBMS

* No declarative query language  more programming

Every problem cannot be solved by traditional

Inspired by Distributed Data Storage problems

Clients know how to:

Servers know how to:

RDBMS uses buffer to ensure ACID properties

Simple web application with not much traffic

More traffic comes in

Even more traffic comes in

 Even more traffic comes in

You can scale SQL databases (Oracle, MySQL, SQL

Almost infinite horizontal scaling

Wide Column Store / Column Families

Lineage: Amazon's Dynamo paper and Distributed

Lineage: Inspired by Lotus Notes.

Lineage: Draws from Euler and graph theory.

Google’s framework for processing highly

Hadoop / Hbase MemcacheDB

Data is stored as BSON (binary JSON)

Written in: Java

Cassandra is open source DBMS from Appache software

Written in: Java

 Scales to hundreds or thousands of computers, each with several

Uses distributed file system (HDFS)

A petabyte-scale data warehouse system for Hadoop

NoSQL is a great problem solver if you need it

You might also like