0% found this document useful (0 votes)
29 views10 pages

Intro-Databases For Big Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views10 pages

Intro-Databases For Big Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

DATS310 d

Databases for Big Data

DR. RICHA SHARMA


C O M M O N W E A LT H U N I V E R S I T Y

1
Introduction
 Architecture for databases:
 Focuses on storage and organization of information to
allow easy access and modification (insert, update, delete
operation) of data.

 Database design and application development depends a


lot on database architecture!

 Architectural design of Database varies just as network


topology varies.

 Helps in identifying which database design is best suitable


for the problem at hand, i.e. the application to be
developed!
2
Tools/Technologies for Big Data
 Few Examples:
 Apache Hadoop, Spark, Kafka, Hive, Storm

 MongoDB and CouchDB

 Redis, Cassandra and Neo4j

 Druid and Google Big Query

 AWS DynamoDB

 Google Big Query

 Tableau

3
Questions to explore
 Type of database – does the problem at hand requires
relational database, key-value pair database, columnar
database, document-oriented database or graph
database?

 Nature of problem and usage of database – does the


problem require flexibility or does it require parallel
processing?

 Communication interface of database – are we going to


interact with database through an interactive command-like
interface or through the application requiring database
connectivity and programming language interfacing?

4
Questions to explore
 Unique characteristic of database – Any database will support
writing data and reading it back again, but what makes it
unique? Some allow querying on arbitrary fields; some
provide indexing for rapid lookup; some support ad hoc
queries, while queries must be planned for others.
 Performance – How does this database function and at what
cost? How about replication? Is this database tuned for
reading, writing, or some other operation?
 Scalability – Scalability closely related to performance and
point to explore is if the database is geared more for
horizontal scaling (MongoDB, HBase, DynamoDB) or
traditional vertical scaling (Postgres, Neo4J, Redis), or
something in between.
5
RDBMS vs Big Databases

6
Key-Value Pair Database
 Simplest database model, storing data as key-value (KV) pair
just like a hash-table.
 Some KV implementations provide a means of iterating
through the keys, but not all!
 A file system can be considered a key-value store assuming
the file path as the key and the file contents as the value.
 Since this database model doesn’t require complex data
structures for storage, it can be incredibly performant in a
number of scenarios but generally won’t be helpful when we
have complex query and aggregation requirements.
 Example: Redis, DynamoDB, Voldemort, Riak etc.

7
Columnar Database
 Columnar, or column-oriented, databases are so named
because these database store the data from a given column
(in the two-dimensional table sense) together, as opposite to
row-oriented databases (RDBMS).

 These databases make adding columns to table quite


inexpensive, and this is done on a row-by-row basis.

 Each row can have a different set of columns, or none at all,


allowing tables to remain sparse without incurring a storage
cost for null values.

 With respect to structure, columnar is about midway between


relational and key-value. Example: HBase, Cassandra etc.
8
Document Database
 Meant to store documents, considering a document like a
hash, with a unique ID field, and values that may be any of a
variety of types, including more hashes.
 Documents can contain nested structures, and so they exhibit
a high degree of flexibility, allowing for variable domains.
 But, the system imposes few restrictions on incoming data, as
long as it meets the basic requirement of being expressible as
a document.
 Different document databases take different approaches with
respect to indexing, ad hoc querying, replication, consistency,
and other design decisions.
 Example: MongoDB, CouchDB etc.
9
Graph Database
 Less commonly used database styles, but graph databases
are best for working with highly interconnected data.

 A graph database consists of nodes and relationships


between nodes.

 Both nodes and relationships can have properties and key-


value pairs that store data.

 Real strength of graph databases is traversing through the


nodes by following relationships..

 Example: Neo4J, Polyglot etc.

10

You might also like