Databases in Computer World
Databases in Computer World
World
DBMS / RDBMS / NO SQL
By Himanshu Patel
1/23/17
DBMS vs RDBMS
No
.
DBMS
RDBMS
In DBMS, data is generally stored in either In RDBMS, the tables have an identifier called primary
a hierarchical form or a navigational form. key and the data values are stored in the form of tables.
Normalization is not present in DBMS. Normalization is present in RDBMS.
3
4
7
8
DBMS does not apply any security with RDBMS defines the integrity constraint for the
regards to data manipulation.
purpose of ACID (Atomocity, Consistency, Isolation and
Durability) property.
DBMS uses file system to store data, so in RDBMS, data values are stored in the form of tables,
there will be no relation between the so a relationshipbetween these data values will be
tables.
stored in the form of a table as well.
DBMS has to provide some uniform RDBMS system supports a tabular structure of the data
methods to access the stored information. and a relationship between them to access the stored
information.
DBMS does not support distributed RDBMS supports distributed database.
database.
DBMS is meant to be for small RDBMS is designed to handle large amount of data. it
organization and deal with small data. it supports multiple users.
supports single user.
Examples
of
DBMS
are
file Example
of
RDBMS
are mysql, postgre, sql
systems, xml etc.
server, oracle etc.
1/23/17
SQL vs NoSQL
No
.
Relational Database
Vertical Scalable
Horizontal scalable
Strict Schema
Flexible schema
1/23/17
may
SQL vs NoSQL
1/23/17
ACID vs BASE
No
.
ACID (relational)
BASE (NoSQL)
Strong consistency
Isolation
Transaction
Program managed
Robust database
Simple database
Complex code
Scale-up (limited)
Scale-out (unlimited)
1/23/17
Weak consistency
Last write wins
1/23/17
CAP Theorem
1/23/17
Database System
Database Popularity
Database Model
Key value- Column
stores
family
RDF
Stores
Amazon
DynamoDB
Apache Jena
arangodb
Mongo DB
Datomic
Couch DB
Big table
Redis
Hbase
Riak
hyper table
Orient DB
Rethink DB
Voldemort
Cassandra
FatDB
Raven DB
AlchemyDB
terrastore
FoundationDB
Apache
Accumulo
Sesame
BangDB
Jas DB
KAI
Raptor DB
hamsterdb
djon DB
Tarantool
EJDB
Maxtable
denso DB
HyperDex
Couchbase
1/23/17
InterSystems
Cach
GT.M
10
Comparison of Databases
MSSQL
MongoDB, Inc
Cassandra
Relational DBMS
Document-oriented
JOINs
Yes
No
No
Transaction
ACID
No
No
Data schema
Fixed
Dynamic
Flexible
Scalability
Vertical
Horizontal
Horizontal
Replication
Yes
Query Language
MapReduce
No
Yes
Yes
Triggers
Yes
No
Yes
Foreign keys
Yes
No
No
Concurrency
Yes
Yes
Yes
Company
Microsoft
MongoDB, Inc
Licence
Commercial
Open Source
Apache Software
Foundation
Open Source
Implementation language
C++
C++
Java
OS support
Windows
1/23/17
CQL
Windows, Linux, OS X,
BSD ,Linux, OS X, Windows
Solaris
.NET, Java, PHP, Python,
Actionscript, C, C#, C++,
C#, C++, Clojure, Erlang,
Ruby, Visual Basic
Clojure, ColdFusion, D, Dart, Go, Haskell, Java, JavaScript
Delphi, Erlang, Go, Groovy, , Perl, PHP, Python, Ruby,
Haskell, Java, JavaScript,
Scala
Lisp, Lua, MatLab, Perl, PHP,
Databases in computer
World
11
PowerShell,
Prolog, Python,
R, Ruby, Scala, Smalltalk
Applications
Applications requiring fast
access to a large number of
objects, such as caches or
queues
Limitations
Cannot update subset of a value
Databases : BerkleyDB,MemcacheDB,Redis,DynamoDB
1/23/17
12
Applications
Applications that need to
manage a large variety of
objects that differ in structure
Limitations
No standard query syntax
13
Applications
Limitations
Extension of key-value
model, where the value is a set
of columns (column-family)
Databases : Cassandra,BigTable,HBase,Apache
Accumulo
1/23/17
14
Applications
Limitations
Databases : Neo4J,OrientDB,Apache
Giraph,AllegroGraph
1/23/17
15
high
high
None
variable (none)
Column Stores
high
moderate
low
minimal
variable (high)
high
low
variable (low)
variable
high
high
graph theory
Relational
Databases
variable
low
moderate
relational algebra
1/23/17
high
variable
16
Database ranking
Ra DBMS
nk
1
Oracle
MySQL
3
4
Microsoft SQL
Server
MongoDB
PostgreSQL
DB2
Cassandra
Redis
Database Score
Model
Relational
DBMS
Relational
DBMS
Relational
DBMS
Document store
Relational
DBMS
Relational
DBMS
Wide column
store
Key-value store
23 Memcached
Key-value store
29.09
1362.65
24 Amazon
DynamoDB
27 CouchDB
Document store
28.98
Document store
22.18
32 Riak KV
Key-value store
10.88
33 MarkLogic
Multi-model
10.3
38 Hazelcast
Key-value store
7.53
39 Sphinx
Search engine
41 Ehcache
Key-value store
6.44
42 OrientDB
Multi-model
6.25
45 InfluxDB
5.32
46 RethinkDB
Time Series
DBMS
Document store
47 Titan
Graph DBMS
5.12
55 Adabas
Multivalue
DBMS
Content store
3.89
1214.18
318.8
318.69
180.56
135.06
109.54
Search engine
99.12
14 Solr
Search engine
66.57
15 HBase
58.19
Splunk
Wide column
store
Search engine
Neo4j
Graph DBMS
36.45
21
http://db-engines.com/en/ranking
Couchbase
Document store
22
1/23/17
Database Score
Model
1417.1
11 Elasticsearch
17
Ra DBMS
nk
(Dec 2016)
53
57 Jackrabbit
29.3
Databases in computer World
7.3
5.23
3.58
17
Database ranking
Ra DBMS
nk
58 Accumulo
Database Score
Model
Ra DBMS
nk
Wide column
store
Search engine
Multi-model
72 Apache Drill
Multi-model
73 RRDtool
Time Series
DBMS
Multi-model
2.48
Object oriented
DBMS
Time Series
DBMS
RDF store
2.08
Object oriented
DBMS
RDF store
1.59
67 Google Search
Appliance
68 Virtuoso
79 ArangoDB
80 Cach
85 Graphite
86 Jena
94 Db4o
96 RDF4J
98 OpenTSDB
2.73 111 D3
2.15
1.9
116 ObjectStore
117 Giraph
119 BaseX
122 Matisse
Time Series
DBMS
99 IMS
Navigational
1.47
140 Druid
DBMS
http://db-engines.com/en/ranking
103
Versant Object Object oriented
1.3
145 Hypertable
Database
DBMS
1/23/17
Databases in computer World
(Dec 2016)
Database Score
Model
Native XML
DBMS
Multivalue
DBMS
Native XML
DBMS
Multivalue
DBMS
Object oriented
DBMS
Graph DBMS
1.22
Native XML
DBMS
Object oriented
DBMS
Multivalue
DBMS
Multivalue
DBMS
RDF store
0.92
Navigational
DBMS
Time Series
DBMS
Wide column
store
0.62
1.03
1.03
1.02
0.98
0.95
0.8
0.76
0.74
0.7
0.6
0.56
18
Database ranking
(Dec 2016)
Ra DBMS
nk
Database Score
Model
Wide column
store
Event Store
0.34
192 4store
RDF store
0.22
197 eXist-db
199 Redland
Native XML
DBMS
RDF store
201 InfiniteGraph
Graph DBMS
0.19
203 ModeShape
Content store
0.18
214 NEventStore
Event Store
0.15
221 Dgraph
Graph DBMS
0.13
0.26
0.2
0.2
http://db-engines.com/en/ranking
1/23/17
19
Original author(s)
Avinash
Lakshman,
Prashant Malik
Developer(s) Apache Software
Foundation
Initial release 2008
Stable release
3.9 / Sep 29,
2016
Written in Java
Operating system
Crossplatform
Website
cassandra.apache.org
1/23/17
20
1/23/17
21
Cassandra Architecture
Cassandra is a distributed, decentralized, fault
tolerant, eventually consistent, linearly
scalable, and columnoriented data store.
Virtual nodes
1/23/17
22
There are four data buckets that you need to know. MemTable is a
hash table-like structure that stays in memory. It contains actual
cell data. SSTable is the disk version of MemTables. When
MemTables are full, they are persisted to hard disk as SSTable.
Commit log is an append only log of all the mutations that are sent
to the Cassandra cluster
Commit log lives on the disk and helps to replay uncommitted
changes. These three are basically core data. Then there are
bloom filters and index. The bloom filter is a probabilistic data
structure that lives in the memory. They both live in memory and
contain information about the location of data in the SSTable. Each
SSTable has one bloom filter and one index associated with it. The
bloom filter helps Cassandra to quickly detect which SSTable does
not have the requested data, while the index helps to find the
1/23/17
Databases in computer World
23
exact location of the data
in the SSTable file.
Definition of Cassandra
Apache Cassandra is a
Distributed...
High performance...
Extremely scalable...
Fault tolerant (ie. no single point of failure)...
Post-relational database solution. Cassandra can serve as both real-time
datastore (the system of record") for online/transactional applications.
and as a read-intensive database for business intelligence systems.
1/23/17
24
Architecture Overview
Cassandra was designed with the
Understanding that
system/hardware failures can and do occur
Peer-to-peer, distributed system
All nodes the some
Data partitioned among all nodes in the
cluster
Custom data replication to ensure fault
tolerance
Read/Write-anywhere design
1/23/17
25
Architecture Overview
The schema used in Cassandra is mirrored offer Google
Bigtable. It is o row oriented, column structure
A keyspace is akin to a database in the RDBMS world
A column family is similar to on RDBMS table but is more
flexible/dynamic
A row in a column family is indexed by its key. Other
columns may be indexed as well
1/23/17
26
1/23/17
27
1/23/17
28
1/23/17
29
1/23/17
30
1/23/17
31
Flexible Schema
Dynamic schema design allows for much more flexible data
storage than rigid RDBMS
Handles structured, semistructured, and unstructured data.
Counters also supported
No offline/downtime for schema changes
Supports primary and secondary indexes
1/23/17
32
Data Compression
Uses Google's Snappy data compression algorithm
Compresses data on a per column family level
Internal tests al Datastax show up to 80%+
compression of raw data
No performance penalty (and some increases in
overall performance due to less physical I/O)!
1/23/17
33
CQL Language
Very similar to RDBMS SQL syntax
Create objects via DDL [e.g. CREATE...)
Core DML commands supported: INSERT, UPDATE,
DELETE
Query data with SELECT
1/23/17
34
1/23/17
35
What is AMAZON S3
1/23/17
36
1/23/17
37
38
1/23/17
39
40
Disadvantages of using S3
Not user-friendly for beginner level
computer users. S3 is basically UI-less.
Trust. Not all types of business or services might
be comfortable with storing their data in the
'cloud', especially those with extremely sensitive
and confidential data. E.g. Banking
Although it promises 99.9% of uptime in its . in
2008 it has 2 major outages in February and July,
bringing down Web 2.0 start-ups like Twitter.
Back in 2007, S3 had Speed issue with reading
and writing of data
1/23/17
41
Requirements
To get started using S3, an AWS account
is needed. An AWS account is simply an
Amazon.com account that has AWS
services enabled.
Sign up at https://aws.amazon.com/
After creating the AWS account, you need to sign up for
53 by clicking the sign up for this web service button at
this
A credit card needs to be associated with the account.
You will be given a Access Key ID and secret Access Key
on successful creation.(note: they are not emailed to you)
1/23/17
42
Pricing
Charges for using S3 is based on the location of your
buckets
You are billed according to storage(average), data transfer
in and out and the number of requests per month.
There is no minimum fee to use S3, you
pay for only what you use.
You can view your current charges incurred almost
immediately on the S3 portal.
Detailed usage reports can also be downloaded in xml or
csv format.
1/23/17
43
Pricing US usage
Storage
$0.15 par GB-Month storage used
Data Transfer
$0.100 per GB a|\ dale transfer in
$
$
$
$
0.170
0.130
0.110
0.100
Requests
$0.01 per 1000 PUT, POST, at LIST requests
$0.01 per 10,000 GET and all other requests*
' No charge for delete requests
1/23/17
44
Implementation
To start using S3, get hold of your S3 access
key ID and secret access key via the AWS
portal.
Next, get hold of an application capable of
managing S3. Here are a few resources:
Spaceblock: Windows Application
S3 Web interface: Web App/Interface
S3 Firefox organizer: Firefox add-on
These applications make objects more
manageable because they provide a directory
structure similar to windows explorer.
1/23/17
45
Implementation
What can we use S3 for?
- HTML microsites
- Flash microsites
- Media storage
- Backups
For HTML and Flash microsites, custom URLs can be created
by using CNAME to create DNS alias.
No server side processing should be used in S3 as they will not
work without web servers(i.e. IIS , Apache)
1/23/17
46
Implementation
1/23/17
47
www.MyWebSite.com
(dynamic data)
Amazon Route 53
(DNS)
media.MyWebSite.com
(static data)
Elastic Load
Balancer
Amazon
CloudFront
Amazon EC2
Amazon RDS
Availability Zone #1
Amazon
RDS
Amazon S3
Availability Zone #2
1/23/17
48
1/23/17
49
1/23/17
50
1/23/17
51
1/23/17
52
1/23/17
53
1/23/17
54
1/23/17
55
1/23/17
56
1/23/17
57
1/23/17
58
1/23/17
59
1/23/17
60
1/23/17
61
1/23/17
62
1/23/17
63
1/23/17
64
1/23/17
65
1/23/17
66
1/23/17
67
1/23/17
68
1/23/17
69
1/23/17
70
1/23/17
71