Module 6
Module 6
Databases and data warehouses are both systems that store data. But they serve very
different purposes. In this article, we’ll explain what they do, the key differences
between them, and why using them effectively is essential for you to grow your
business.
We’ll start with some high-level definitions before giving you more detailed
explanations.
What is a Database?
A database stores real-time information about one particular part of your business: its
main job is to process the daily transactions that your company makes, e.g., recording
which items have sold. Databases handle a massive volume of simple queries very
quickly.
A data warehouse stores historical data about your business so that you can analyze
and extract insights from it. It does not store current information, nor is it updated in
real-time.
The most significant difference between databases and data warehouses is how they
process data.
Databases use OnLine Transactional Processing (OLTP) to delete, insert, replace, and
update large numbers of short online transactions quickly. This type of processing
immediately responds to user requests, and so is used to process the day-to-day
operations of a business in real-time. For example, if a user wants to reserve a hotel
room using an online booking form, the process is executed with OLTP.
Optimization
A database is optimized to update (add, modify, or delete) data with maximum speed
and efficiency. Response times from databases need to be extremely quick for
efficient transaction processing. The most important aspect of a database is that it
records the write operation in the system; a company won’t be in business very long if
its database didn’t make a record of every purchase!
Data warehouses are optimized to rapidly execute a low number of complex queries
on large multi-dimensional datasets.
Data Structure
The data in databases are normalized. The goal of normalization is to reduce and even
eliminate data redundancy, i.e., storing the same piece of data more than once. This
reduction of duplicate data leads to increased consistency and, thus, more accurate
data as the database stores it in only one place.
Normalizing data splits it into many different tables. Each table represents a separate
entity of the data. For example, a database recording BOOK SALES may have three
tables to denote BOOK information, the SUBJECT covered in the book, and the
PUBLISHER.
Normalizing data ensures the database takes up minimal disk space and so it is
memory efficient. However, it is not query efficient. Querying a normalized database
can be slow and cumbersome. Since businesses want to perform complex queries on
the data in their data warehouse, that data is often denormalized and contains repeated
data for easier access.
Data Analysis
Databases usually just process transactions, but it is also possible to perform data
analysis with them. However, in-depth exploration is challenging for both the user and
computer due to the normalized data structure and the large number of table joins you
need to perform. It requires a skilled developer or analyst to create and execute
complex queries on a DataBase Management System (DBSM), which takes up a lot of
time and computing resources. Moreover, the analysis does not go deep - the best you
can get is a one-time static report as databases just give a snapshot of data at a specific
time.
Data warehouses are designed to perform complex analytical queries on large multi-
dimensional datasets in a straightforward manner. There is no need to learn advanced
theory or how to use sophisticated DBMS software. Not only is the analysis simpler to
perform, but the results are much more useful; you can dive deep and see how your
data changes over time, rather than the snapshot that databases provide.
Data Timeline
Databases process the day-to-day transactions for one aspect of the business.
Therefore, they typically contain current, rather than historical data about one
business process.
Data warehouses are used for analytical purposes and business reporting. Data
warehouses typically store historical data by integrating copies of transaction data
from disparate sources. Data warehouses can also use real-time data feeds for reports
that use the most current, integrated information.
Concurrent Users
However, only one user can modify a piece of data at a time - it would be disastrous if
two users overwrote the same information in different ways at the same time!
ACID Compliance
Since data warehouses focus on reading, rather than modifying, historical data from
many different sources, ACID compliance is less strictly enforced. However, the top
cloud providers like Redshift and Panoply do ensure that their queries are ACID
compliant where possible. For instance, this is always the case when using MySQL
and PostgreSQL.
Most SLAs for databases state that they must meet 99.99% uptime because any
system failure could result in lost revenue and lawsuits.
SLAs for some really large data warehouses often have downtime built in to
accommodate periodic uploads of new data. This is less common for modern data
warehousing.
Database Use Cases
Databases process the day-to-day transactions in an organization. Some examples of
database applications include:
Downtime is built-in to
Uptime 99.99% uptime accommodate periodic uploads of
new data
Limited to a single data source
All data sources from all business
Storage from a particular business
functions
function
Complex queries for in-depth
Query type Simple transactional queries
analysis
Data As granular and precise as you
Highly granular and precise
summary want it to be