SlideShare a Scribd company logo
The complexity for minimum component costs has increased at a rate of roughly a
factor of two per year...Certainly over the short term this rate can be expected to
continue, if not to increase. Over the longer term, the rate of increase is a bit more
uncertain, although there is no reason to believe it will not remain nearly constant
for at least 10 years.
-- Gordon Moore, 1965
…Then you better start swimmin’…Or you’ll sink like a
stone…For the times they are a-changin’.
-- Bob Dylan
•NoSQL is a set of concepts that allows the rapid
and efficient processing of data sets with a focus
on performance, reliability, and agility.
Definition of NoSQL
Sounds great… What???
Operational Data
• Read and written by applications to carry out their ordinary functions.
• Examples:
• Shopping cart data in Amazon.com
• Information about employees in a human resources system
• Buy/Sell prices in Fidelity
• Posts made by Facebook users
• Travel Itineraries for bookings done on Expedia
Two Categories of Data
Analytical Data
• Used to provide business intelligence (BI).
• Data is often created by storing the operational data used by applications
over time, and it’s commonly read-only.
• Because these analytical datasets provide a historical record, they’re
commonly much bigger than an application’s current operational data.
• Example:
• A e-commerce company might record all of the purchase data from its web
application, then analyze this data to learn about customer buying habits or market
trends.
• Facebook might sell all the posts made by its users to other companies who can
analyze the posts to determine each user’s significant events so that they can tailor
offers based on user needs, likes and dislikes.
Two Categories of Data
The Problem called Big Data
Cracks in the Single CPU RDBMS System
due to pressure from the four business drivers of the current age.
Volume
• Need to query big data always resulted in performance concerns
in RDBMS.
• These performance concerns were solved by purchasing faster
processors.
• But, the power wall was reached which meant increasing
processor speed was no longer an option.
• System designers shifted their focus from increasing speed on a
single chip (vertical scaling or scale up) to using more processors
working together (horizontal scaling or scale out).
The Problem called Big Data
Velocity
• Many single-processor RDBMSs are unable to keep up with the
demands of real-time inserts and online queries to the database
made by public-facing websites.
• RDBMSs frequently index many columns of every new row, a
process which decreases system performance.
• When single-processor RDBMSs are used as a back end to a web
store front, the random bursts in web traffic slow down response
for everyone, and tuning these systems can be costly when both
high read and write throughput is desired.
• This was another reason for engineers to look for a scaled out
solution.
The Problem called Big Data
Variability
• Companies that want to capture and report on exception data
struggle when attempting to use rigid database schema
structures imposed by RDBMS. For example, if a business unit
wants to capture a few custom fields for a particular customer,
all customer rows within the database need to store this
information even though it doesn’t apply.
• Adding new columns to an RDBMS requires the system be shut
down and ALTER TABLE commands to be run. When a database
is large, this process can impact system availability, costing time
and money.
• This was another reason engineers looked for a more viable
solution.
The Problem called Big Data
Agility
• The most complex part of building applications using RDBMSs is
the process of putting data into and getting data out of the
database.
• If your data has nested and repeated subgroups of data
structures, you need to include an object-relational mapping
layer. The responsibility of this layer is to generate the correct
combination of INSERT, UPDATE, DELETE, and SELECT SQL
statements to move object data to and from the RDBMS
persistence layer.
• This process isn’t simple and is associated with the largest
barrier to rapid change when developing new or modifying
existing applications.
The Problem called Big Data
• It’s more than rows in tables
• NoSQL systems store and retrieve data from many formats: key-value stores, graph
databases, column-family stores, document stores, and even rows in tables.
• It’s free of joins
• NoSQL systems allow you to extract your data using simple interfaces without joins.
• It’s schema-free
• NoSQL systems allow you to drag-and-drop your data into a folder and then query it
without creating an entity-relational model.
The Solution called NoSQL
• It works on many processors
• NoSQL systems allow you to store your database on multiple processors and maintain
high-speed performance.
• It uses shared-nothing commodity computers
• Most NoSQL systems leverage low-cost commodity processors that have separate
RAM and disk.
• It supports linear scalability
• When you add more processors, you get a consistent increase in performance.
• It’s innovative
• NoSQL offers options to a single way of storing, retrieving, and manipulating data.
NoSQL supporters (also known as NoSQLers) have an inclusive attitude about NoSQL
and recognize SQL solutions as viable options. To the NoSQL community, NoSQL
means “Not only SQL.”
What else?
• It’s not about not using the SQL language
• It’s not only open source
• It’s not only about volume
• It’s not about cloud computing
• It’s not just a clever use of RAM and SSD
• It’s not an elite group of products
• It’s not just Hadoop
What is NoSQL not…
Single Complex Component Vs Multiple Simple Components
• Removes Complexity
• Promotes Reuse
• Easier Maintenance
• Functions distributed to many NoSQL (and SQL) databases that
consist of simple tools that have simpler interfaces and well-
defined roles.
• NoSQL products take a Master of one thing Vs Jack of All things
approach.
• Example: MemCache to share objects in RAM, MapReduce to
run batch jobs, DynamoDB to store key-value items.
NoSQL Concepts
Use application tiers to simplify design
NoSQL Concepts
Strategic Use of RAM, SSD and HDD using Consistent Hashing
NoSQL Concepts
Transaction Control Using ACID
•Atomicity
•Consistency
•Isolation
•Durability
NoSQL Concepts
Transaction Control Using BASE
•BAsic Availability
•Soft State
•Eventual Consistency
NoSQL Concepts
NoSQL Concepts
ACID BASE
Get transaction details right Never block a write
Block any reports while you are
working
Focus on throughput, not consistency
Be pessimistic, anything might go
wrong!
Be optimistic, if one service fails it will
eventually get caught up
Detailed testing and failure mode
analysis
Some reports may be inconsistent for
a while, but don’t worry
Lots of locks and unlocks Keep things simple and avoid locks
Automatic Sharding
NoSQL Concepts
Eric Brewer’s CAP Theorem for Replication
Consistency—Having a single, up-to-date, readable version of your data
available to all clients. Consistency here is concerned with multiple clients
reading the same items from replicated partitions and getting consistent
results.
High availability—Knowing that the distributed database will always allow
database clients to update items without delay. Internal communication
failures between replicated data shouldn’t prevent updates.
Partition tolerance—The ability of the system to keep responding to client
requests even if there’s a communication failure between database partitions.
This is analogous to a person still having an intelligent conversation even after
a link between parts of their brain isn’t working.
NoSQL Concepts
NoSQL Concepts Eric Brewer’s CAP Theorem for Replication
NoSQL Concepts
in Action
Four Quadrants of Data Technologies
Operational Relational
SQL Relational Databases
Oracle
SQL Server
MySQL
Relational Analytics
Oracle
SQL Server
MySQL
NoSQL Key-Value Stores
DynamoDB, Azure Tables, Riak, etc.
Column Family Stores
Apache HBase, Apache Cassandra,
Google BigTable, etc.
Document Stores
MongoDB, DocumentDB, etc.
Graph Stores
Neo4j, AllegoGraph, etc.
Big Data Analytics
Hadoop
HDInsight
Operational NoSQL
RDBMS
Key/Value Stores
Column Family Stores
Document Stores
Document Stores
Graph Stores
Graph Store Example: Social Network
Graph Store Example: User’s Order History
Graph Store Example: Airport Terminal
Analytical NoSQL
Big Data Analytics using Hadoop
Big Data Analytics using Hadoop
Hadoop Core Technologies
• Hadoop Distributed File System (HDFS)
• Provides a way to store and access very large binary files across a cluster of
commodity servers and disk drives.
• Hadoop MapReduce
• Supports the creation of applications that process large amounts of analytical data in
parallel. That data is commonly stored in HDFS.
• Hive
• A Hadoop-based framework for querying and analyzing data. Among other things, it
provides HiveQL, a SQL-like language that can generate MapReduce jobs.
• Pig
• Another Hadoop-based framework for working with data. It provides a language called
Pig Latin for creating MapReduce jobs.
Big Data Analytics using Hadoop
• NoSQL really means Not Only SQL
• Volume, Velocity, Variability & Agility are the main business
drivers for NoSQL.
• Key NoSQL Concepts: Multiple Simple Components, Application
Tiers With External Services, Strategic Use of RAM, SSD, HDD,
BASE Transaction Control, Automatic Sharding, Replication Using
CAP.
• Popular NoSQL Datastores: Key-Value, Column Family,
Document, Graph.
• Big Data Analytics using Hadoop
Quick Recap
Q & A

More Related Content

PPTX
CloudComputing
PPTX
RESTful APIs
PPTX
OAuth
PPTX
Database as a Service (DBaaS) on Kubernetes
PPTX
Responsive Web Design ~ Best Practices for Maximizing ROI
PDF
DBaaS- Database as a Service in a DBAs World
PPTX
Enterprise Manager DBaaS
PPTX
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
CloudComputing
RESTful APIs
OAuth
Database as a Service (DBaaS) on Kubernetes
Responsive Web Design ~ Best Practices for Maximizing ROI
DBaaS- Database as a Service in a DBAs World
Enterprise Manager DBaaS
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013

What's hot (20)

PPT
Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...
PPTX
Design a share point 2013 architecture – the basics
PPSX
Saas & DBaas
PPT
SharePoint Topology
PPTX
Databus - LinkedIn's Change Data Capture Pipeline
PDF
Getting SharePoint 2010 Deployment Right final
PDF
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
PDF
Dogfood Conference 2010 - What Every SharePoint 2010 Administrator Must Know
PDF
Tuning Your SharePoint Environment
PPTX
Creating a Multi-Layered Secured Postgres Database
 
PPTX
RavenDB overview
PPTX
Developing a provider hosted share point app
PPTX
Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...
PPTX
Who Will Win the Database Wars?
PPT
MongoDB in the Healthcare Enterprise
PDF
Introduction to Java Enterprise Edition
PPTX
Cloud's Hidden Impact on IT Support Organizations
PDF
TS 4839 - Enterprise Integration Patterns in Practice
PPTX
Virtualizing Sharepoint for Performance and Availability
PPTX
Introduction to Azure SQL DB
Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...
Design a share point 2013 architecture – the basics
Saas & DBaas
SharePoint Topology
Databus - LinkedIn's Change Data Capture Pipeline
Getting SharePoint 2010 Deployment Right final
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
Dogfood Conference 2010 - What Every SharePoint 2010 Administrator Must Know
Tuning Your SharePoint Environment
Creating a Multi-Layered Secured Postgres Database
 
RavenDB overview
Developing a provider hosted share point app
Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...
Who Will Win the Database Wars?
MongoDB in the Healthcare Enterprise
Introduction to Java Enterprise Edition
Cloud's Hidden Impact on IT Support Organizations
TS 4839 - Enterprise Integration Patterns in Practice
Virtualizing Sharepoint for Performance and Availability
Introduction to Azure SQL DB
Ad

Viewers also liked (20)

PDF
NJ Wrestling Region 4 Champions
PPTX
El sueno
PPT
La storia è di tutti
PPS
Riscopriamo il mondo contadino della Maremma Settentrionale
PPTX
Tarea 5 motivacion_judithazuaje
PPS
Un fantasma a...scuola
PPS
Scuolasicura
PDF
Pennsylvanian 1-19-79
PDF
Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...
PDF
CHURCH AND SOCIETY--Table and Intoduction
PDF
In-Vitro Paper
PPTX
J.Levy Persentation - M3
PDF
Garden State HS Wrestling Champions
PPTX
SinglePageApplications
PPTX
Presentation 2
PPTX
Basic android
PPTX
No morebullconferencefeb2015
PPTX
Presentation 4
PDF
1979 EIWA Championship
PDF
Spanish Vocab
NJ Wrestling Region 4 Champions
El sueno
La storia è di tutti
Riscopriamo il mondo contadino della Maremma Settentrionale
Tarea 5 motivacion_judithazuaje
Un fantasma a...scuola
Scuolasicura
Pennsylvanian 1-19-79
Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...
CHURCH AND SOCIETY--Table and Intoduction
In-Vitro Paper
J.Levy Persentation - M3
Garden State HS Wrestling Champions
SinglePageApplications
Presentation 2
Basic android
No morebullconferencefeb2015
Presentation 4
1979 EIWA Championship
Spanish Vocab
Ad

Similar to NoSQLDatabases (20)

PPTX
Chapter1: NoSQL: It’s about making intelligent choices
PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
PPTX
Introduction to NoSQL
PPTX
Introduction to NoSQL database technology
PPTX
NoSql Brownbag
PPTX
Introduction to NoSQL and MongoDB
PPTX
How To Tell if Your Business Needs NoSQL
PPTX
Transform your DBMS to drive engagement innovation with Big Data
PPTX
Introduction to NoSQL
PPTX
Relational and non relational database 7
PDF
Evolution of Distributed Database Technologies in the Digital era
PPTX
cours database pour etudiant NoSQL (1).pptx
PPTX
NoSQL.pptx
PPTX
Relational databases vs Non-relational databases
PPTX
Introduction to no sql database
PPT
No sql
PPTX
Introduction to Data Science NoSQL.pptx
PPTX
No sql database
PPTX
No SQL DATABASE Description about 4 no sql database.pptx
PPTX
Nosql-Module 1 PPT.pptx
Chapter1: NoSQL: It’s about making intelligent choices
Module 2.2 Introduction to NoSQL Databases.pptx
Introduction to NoSQL
Introduction to NoSQL database technology
NoSql Brownbag
Introduction to NoSQL and MongoDB
How To Tell if Your Business Needs NoSQL
Transform your DBMS to drive engagement innovation with Big Data
Introduction to NoSQL
Relational and non relational database 7
Evolution of Distributed Database Technologies in the Digital era
cours database pour etudiant NoSQL (1).pptx
NoSQL.pptx
Relational databases vs Non-relational databases
Introduction to no sql database
No sql
Introduction to Data Science NoSQL.pptx
No sql database
No SQL DATABASE Description about 4 no sql database.pptx
Nosql-Module 1 PPT.pptx

NoSQLDatabases

  • 1. The complexity for minimum component costs has increased at a rate of roughly a factor of two per year...Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. -- Gordon Moore, 1965 …Then you better start swimmin’…Or you’ll sink like a stone…For the times they are a-changin’. -- Bob Dylan
  • 2. •NoSQL is a set of concepts that allows the rapid and efficient processing of data sets with a focus on performance, reliability, and agility. Definition of NoSQL Sounds great… What???
  • 3. Operational Data • Read and written by applications to carry out their ordinary functions. • Examples: • Shopping cart data in Amazon.com • Information about employees in a human resources system • Buy/Sell prices in Fidelity • Posts made by Facebook users • Travel Itineraries for bookings done on Expedia Two Categories of Data
  • 4. Analytical Data • Used to provide business intelligence (BI). • Data is often created by storing the operational data used by applications over time, and it’s commonly read-only. • Because these analytical datasets provide a historical record, they’re commonly much bigger than an application’s current operational data. • Example: • A e-commerce company might record all of the purchase data from its web application, then analyze this data to learn about customer buying habits or market trends. • Facebook might sell all the posts made by its users to other companies who can analyze the posts to determine each user’s significant events so that they can tailor offers based on user needs, likes and dislikes. Two Categories of Data
  • 5. The Problem called Big Data Cracks in the Single CPU RDBMS System due to pressure from the four business drivers of the current age.
  • 6. Volume • Need to query big data always resulted in performance concerns in RDBMS. • These performance concerns were solved by purchasing faster processors. • But, the power wall was reached which meant increasing processor speed was no longer an option. • System designers shifted their focus from increasing speed on a single chip (vertical scaling or scale up) to using more processors working together (horizontal scaling or scale out). The Problem called Big Data
  • 7. Velocity • Many single-processor RDBMSs are unable to keep up with the demands of real-time inserts and online queries to the database made by public-facing websites. • RDBMSs frequently index many columns of every new row, a process which decreases system performance. • When single-processor RDBMSs are used as a back end to a web store front, the random bursts in web traffic slow down response for everyone, and tuning these systems can be costly when both high read and write throughput is desired. • This was another reason for engineers to look for a scaled out solution. The Problem called Big Data
  • 8. Variability • Companies that want to capture and report on exception data struggle when attempting to use rigid database schema structures imposed by RDBMS. For example, if a business unit wants to capture a few custom fields for a particular customer, all customer rows within the database need to store this information even though it doesn’t apply. • Adding new columns to an RDBMS requires the system be shut down and ALTER TABLE commands to be run. When a database is large, this process can impact system availability, costing time and money. • This was another reason engineers looked for a more viable solution. The Problem called Big Data
  • 9. Agility • The most complex part of building applications using RDBMSs is the process of putting data into and getting data out of the database. • If your data has nested and repeated subgroups of data structures, you need to include an object-relational mapping layer. The responsibility of this layer is to generate the correct combination of INSERT, UPDATE, DELETE, and SELECT SQL statements to move object data to and from the RDBMS persistence layer. • This process isn’t simple and is associated with the largest barrier to rapid change when developing new or modifying existing applications. The Problem called Big Data
  • 10. • It’s more than rows in tables • NoSQL systems store and retrieve data from many formats: key-value stores, graph databases, column-family stores, document stores, and even rows in tables. • It’s free of joins • NoSQL systems allow you to extract your data using simple interfaces without joins. • It’s schema-free • NoSQL systems allow you to drag-and-drop your data into a folder and then query it without creating an entity-relational model. The Solution called NoSQL
  • 11. • It works on many processors • NoSQL systems allow you to store your database on multiple processors and maintain high-speed performance. • It uses shared-nothing commodity computers • Most NoSQL systems leverage low-cost commodity processors that have separate RAM and disk. • It supports linear scalability • When you add more processors, you get a consistent increase in performance. • It’s innovative • NoSQL offers options to a single way of storing, retrieving, and manipulating data. NoSQL supporters (also known as NoSQLers) have an inclusive attitude about NoSQL and recognize SQL solutions as viable options. To the NoSQL community, NoSQL means “Not only SQL.” What else?
  • 12. • It’s not about not using the SQL language • It’s not only open source • It’s not only about volume • It’s not about cloud computing • It’s not just a clever use of RAM and SSD • It’s not an elite group of products • It’s not just Hadoop What is NoSQL not…
  • 13. Single Complex Component Vs Multiple Simple Components • Removes Complexity • Promotes Reuse • Easier Maintenance • Functions distributed to many NoSQL (and SQL) databases that consist of simple tools that have simpler interfaces and well- defined roles. • NoSQL products take a Master of one thing Vs Jack of All things approach. • Example: MemCache to share objects in RAM, MapReduce to run batch jobs, DynamoDB to store key-value items. NoSQL Concepts
  • 14. Use application tiers to simplify design NoSQL Concepts
  • 15. Strategic Use of RAM, SSD and HDD using Consistent Hashing NoSQL Concepts
  • 16. Transaction Control Using ACID •Atomicity •Consistency •Isolation •Durability NoSQL Concepts
  • 17. Transaction Control Using BASE •BAsic Availability •Soft State •Eventual Consistency NoSQL Concepts
  • 18. NoSQL Concepts ACID BASE Get transaction details right Never block a write Block any reports while you are working Focus on throughput, not consistency Be pessimistic, anything might go wrong! Be optimistic, if one service fails it will eventually get caught up Detailed testing and failure mode analysis Some reports may be inconsistent for a while, but don’t worry Lots of locks and unlocks Keep things simple and avoid locks
  • 20. Eric Brewer’s CAP Theorem for Replication Consistency—Having a single, up-to-date, readable version of your data available to all clients. Consistency here is concerned with multiple clients reading the same items from replicated partitions and getting consistent results. High availability—Knowing that the distributed database will always allow database clients to update items without delay. Internal communication failures between replicated data shouldn’t prevent updates. Partition tolerance—The ability of the system to keep responding to client requests even if there’s a communication failure between database partitions. This is analogous to a person still having an intelligent conversation even after a link between parts of their brain isn’t working. NoSQL Concepts
  • 21. NoSQL Concepts Eric Brewer’s CAP Theorem for Replication
  • 23. Four Quadrants of Data Technologies Operational Relational SQL Relational Databases Oracle SQL Server MySQL Relational Analytics Oracle SQL Server MySQL NoSQL Key-Value Stores DynamoDB, Azure Tables, Riak, etc. Column Family Stores Apache HBase, Apache Cassandra, Google BigTable, etc. Document Stores MongoDB, DocumentDB, etc. Graph Stores Neo4j, AllegoGraph, etc. Big Data Analytics Hadoop HDInsight
  • 25. RDBMS
  • 31. Graph Store Example: Social Network
  • 32. Graph Store Example: User’s Order History
  • 33. Graph Store Example: Airport Terminal
  • 35. Big Data Analytics using Hadoop
  • 36. Big Data Analytics using Hadoop
  • 37. Hadoop Core Technologies • Hadoop Distributed File System (HDFS) • Provides a way to store and access very large binary files across a cluster of commodity servers and disk drives. • Hadoop MapReduce • Supports the creation of applications that process large amounts of analytical data in parallel. That data is commonly stored in HDFS. • Hive • A Hadoop-based framework for querying and analyzing data. Among other things, it provides HiveQL, a SQL-like language that can generate MapReduce jobs. • Pig • Another Hadoop-based framework for working with data. It provides a language called Pig Latin for creating MapReduce jobs. Big Data Analytics using Hadoop
  • 38. • NoSQL really means Not Only SQL • Volume, Velocity, Variability & Agility are the main business drivers for NoSQL. • Key NoSQL Concepts: Multiple Simple Components, Application Tiers With External Services, Strategic Use of RAM, SSD, HDD, BASE Transaction Control, Automatic Sharding, Replication Using CAP. • Popular NoSQL Datastores: Key-Value, Column Family, Document, Graph. • Big Data Analytics using Hadoop Quick Recap
  • 39. Q & A