Intro-Databases For Big Data
Intro-Databases For Big Data
1
Introduction
Architecture for databases:
Focuses on storage and organization of information to
allow easy access and modification (insert, update, delete
operation) of data.
AWS DynamoDB
Tableau
3
Questions to explore
Type of database – does the problem at hand requires
relational database, key-value pair database, columnar
database, document-oriented database or graph
database?
4
Questions to explore
Unique characteristic of database – Any database will support
writing data and reading it back again, but what makes it
unique? Some allow querying on arbitrary fields; some
provide indexing for rapid lookup; some support ad hoc
queries, while queries must be planned for others.
Performance – How does this database function and at what
cost? How about replication? Is this database tuned for
reading, writing, or some other operation?
Scalability – Scalability closely related to performance and
point to explore is if the database is geared more for
horizontal scaling (MongoDB, HBase, DynamoDB) or
traditional vertical scaling (Postgres, Neo4J, Redis), or
something in between.
5
RDBMS vs Big Databases
6
Key-Value Pair Database
Simplest database model, storing data as key-value (KV) pair
just like a hash-table.
Some KV implementations provide a means of iterating
through the keys, but not all!
A file system can be considered a key-value store assuming
the file path as the key and the file contents as the value.
Since this database model doesn’t require complex data
structures for storage, it can be incredibly performant in a
number of scenarios but generally won’t be helpful when we
have complex query and aggregation requirements.
Example: Redis, DynamoDB, Voldemort, Riak etc.
7
Columnar Database
Columnar, or column-oriented, databases are so named
because these database store the data from a given column
(in the two-dimensional table sense) together, as opposite to
row-oriented databases (RDBMS).
10