0% found this document useful (0 votes)
28 views27 pages

Unit 4

Uploaded by

jovialdarwin8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views27 pages

Unit 4

Uploaded by

jovialdarwin8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Introduction to OLTP and OLAP

• Data drives nearly every business today; a company’s ability to harness


the value of its data is crucial to delivering customer experiences and
products/services that keep them relevant and competitive.
• There are two approaches to data processing systems: one focuses on
operations, and the other focuses on analytics for business
intelligence. Both are essential to leverage the full power of data.
• These two systems are Online Transaction Processing
(OLTP) and Online Analytical Processing (OLAP).
• Online transaction processing (OLTP) captures, stores, and processes
data from transactions in real time. Online analytical processing
(OLAP) uses complex queries to analyze aggregated historical data
from OLTP systems.
OLTP
• Online transactional processing (OLTP) is used for real-time execution of large
volumes of database transactions by large numbers of people. OLTP systems are
used for everyday transactions like ATMs, ecommerce purchases, online banking,
text messages, and account changes, among many other day-to-day transactions.
• These transactions use a relational database or SQL database to handle extensive
volumes of simple transactions, enable multi-user access to the same data, process
data quickly, provide index datasets for fast searches, and are available continually.
• An OLTP system captures and maintains transaction data in a database. Each
transaction involves individual database records made up of multiple fields or
columns. This process can be challenging without the right tools.
• In OLTP, the emphasis is on fast processing, because OLTP databases are read,
written, and updated frequently. If a transaction fails, built-in system logic ensures
data integrity.
• OLTP systems can be used to provide data for their OLAP systems, as the two work
together to optimize the value of data.
OLAP
• Data analysts and data engineers use online analytical processing (OLAP) for data
mining, analytics, and business intelligence. OLAP is used to process
multidimensional analysis on large volumes of data at very high speeds
(milliseconds). An OLTP system often processes and stores data in repositories, which
OLAP then sources for analysis. Many businesses use OLAP for financial analysis,
forecasting, budgeting, reporting, marketing and sales optimization, and decision
making.
• OLAP applies complex queries to large amounts of historical data aggregated from
OLTP databases and other sources. In OLAP, the emphasis is on response time to
these complex queries. Each query involves one or more columns of data aggregated
from many rows.
• Examples include year-over-year financial performance or marketing lead generation
trends. OLAP databases and data warehouses give analysts and decision-makers the
ability to use custom reporting tools to turn data into information. Query failure in
OLAP does not interrupt or delay transaction processing for customers, but it can
delay or impact the accuracy of business intelligence insights.
OLTP vs. OLAP: Key differences
DatawareHouse Architecture
• A data-warehouse is a heterogeneous collection of different data
sources organized under a unified schema.

• There are 2 approaches for constructing data-warehouse:


a) Top-down approach
b) Bottom-up approach

The essential components are discussed below:


Top Down Approach
External Sources :

External source is a source from where data is collected irrespective of


the type of data. Data can be structured, semi structured and
unstructured as well.

Stage Area :

Since the data, extracted from the external sources does not follow a
particular format, so there is a need to validate this data to load into
datawarehouse. For this purpose, it is recommended to use ETL tool.
• E(Extracted): Data is extracted from External data source.

• T(Transform): Data is transformed into the standard format.

• L(Load): Data is loaded into datawarehouse after transforming it into the standard
format.

Data-warehouse :

After cleansing of data, it is stored in the datawarehouse as central repository. It


actually stores the meta data and the actual data gets stored in the data marts.
• Note that datawarehouse stores the data in its purest form in this
top-down approach.
• Data Marts :
• Data mart is also a part of storage component. It stores the information of a
particular function of an organisation which is handled by single authority.
• There can be as many number of data marts in an organisation depending
upon the functions. We can also say that data mart contains subset of the data
stored in datawarehouse.

• Data Mining:

• The practice of analysing the big data present in datawarehouse is data mining.
It is used to find the hidden patterns that are present in the database or in
datawarehouse with the help of algorithm of data mining.
• This approach is defined by Inmon as – datawarehouse as a central repository
for the complete organisation and data marts are created from it after the
complete datawarehouse has been created.
Advantages of Top-Down Approach
1.Since the data marts are created from the datawarehouse, provides consistent dimensional
view of data marts.
2.Improved data consistency: The top-down approach promotes data consistency by ensuring
that all data marts are sourced from a common data warehouse. This ensures that all data is
standardized, reducing the risk of errors and inconsistencies in reporting.
3.Easier maintenance: Since all data marts are sourced from a central data warehouse, it is
easier to maintain and update the data in a top-down approach. Changes can be made to
the data warehouse, and those changes will automatically propagate to all the data marts
that rely on it.
4.Better scalability: The top-down approach is highly scalable, allowing organizations to add
new data marts as needed without disrupting the existing infrastructure. This is particularly
important for organizations that are experiencing rapid growth or have evolving business
needs.
5.Improved governance: The top-down approach facilitates better governance by enabling
centralized control of data access, security, and quality. This ensures that all data is managed
consistently and that it meets the organization’s standards for quality and compliance.
6. Reduced duplication: The top-down approach reduces data
duplication by ensuring that data is stored only once in the data
warehouse. This saves storage space and reduces the risk of data
inconsistencies.

7. Better reporting: The top-down approach enables better reporting by


providing a consistent view of data across all data marts. This makes it
easier to create accurate and timely reports, which can improve
decision-making and drive better business outcomes.
Disadvantages of Top Down Approach
1. The cost, time taken in designing and its maintenance is very high.
2. Complexity: The top-down approach can be complex to implement and maintain, particularly for
large organizations with complex data needs. The design and implementation of the data warehouse
and data marts can be time-consuming and costly.
3. Lack of flexibility: The top-down approach may not be suitable for organizations that require a high
degree of flexibility in their data reporting and analysis. Since the design of the data warehouse and
data marts is pre-determined, it may not be possible to adapt to new or changing business
requirements.
4. Limited user involvement: The top-down approach can be dominated by IT departments, which may
lead to limited user involvement in the design and implementation process. This can result in data
marts that do not meet the specific needs of business users.
5. Data latency: The top-down approach may result in data latency, particularly when data is sourced
from multiple systems. This can impact the accuracy and timeliness of reporting and analysis.
6. Data ownership: The top-down approach can create challenges around data ownership and control.
Since data is centralized in the data warehouse, it may not be clear who is responsible for
maintaining and updating the data.
Bottom-Up Approach
1.First, the data is extracted from external sources (same as happens in
top-down approach).

2.Then, the data go through the staging area (as explained above) and
loaded into data marts instead of datawarehouse. The data marts are
created first and provide reporting capability. It addresses a single
business area.

3.These data marts are then integrated into datawarehouse.


Advantages of Bottom-Up Approach
1. As the data marts are created first, so the reports are quickly generated.

2. We can accommodate more number of data marts here and in this way datawarehouse can
be extended.

3. Also, the cost and time taken in designing this model is low comparatively.
4. Incremental development: The bottom-up approach supports incremental development,
allowing for the creation of data marts one at a time. This allows for quick wins and
incremental improvements in data reporting and analysis.
5. User involvement: The bottom-up approach encourages user involvement in the design and
implementation process. Business users can provide feedback on the data marts and
reports, helping to ensure that the data marts meet their specific needs.
6. Flexibility: The bottom-up approach is more flexible than the top-down approach, as it
allows for the creation of data marts based on specific business needs. This approach can be
particularly useful for organizations that require a high degree of flexibility in their reporting
and analysis.
6. Faster time to value: The bottom-up approach can deliver faster
time to value, as the data marts can be created more quickly than a
centralized data warehouse. This can be particularly useful for smaller
organizations with limited resources.

7. Reduced risk: The bottom-up approach reduces the risk of failure, as


data marts can be tested and refined before being incorporated into a
larger data warehouse. This approach can also help to identify and
address potential data quality issues early in the process.
Disadvantage of Bottom-Up Approach
1. This model is not strong as top-down approach as dimensional view of data marts is not consistent
as it is in above approach.
2. Data silos: The bottom-up approach can lead to the creation of data silos, where different business
units create their own data marts without considering the needs of other parts of the organization.
This can lead to inconsistencies and redundancies in the data, as well as difficulties in integrating
data across the organization.
3. Integration challenges: Because the bottom-up approach relies on the integration of multiple data
marts, it can be more difficult to integrate data from different sources and ensure consistency
across the organization. This can lead to issues with data quality and accuracy.
4. Duplication of effort: In a bottom-up approach, different business units may duplicate effort by
creating their own data marts with similar or overlapping data. This can lead to inefficiencies and
higher costs in data management.
5. Lack of enterprise-wide view: The bottom-up approach can result in a lack of enterprise-wide view,
as data marts are typically designed to meet the needs of specific business units rather than the
organization as a whole. This can make it difficult to gain a comprehensive understanding of the
organization’s data and business processes.
6. Complexity: The bottom-up approach can be more complex than the top-down approach, as it
involves the integration of multiple data marts with varying levels of complexity and granularity. This
can make it more difficult to manage and maintain the data warehouse over time.
Characteristics of Datawarehouse
• Subject-oriented – A data warehouse is always a subject oriented as
it delivers information about a theme instead of organization’s
current operations. It can be achieved on specific theme.
• That means the data warehousing process is proposed to handle
with a specific theme which is more defined. These themes can be
sales, distributions, marketing etc.
A data warehouse never put emphasis only current operations.
Instead, it focuses on demonstrating and analysis of data to make
various decision.
• Integrated – It is somewhere same as subject orientation which is
made in a reliable format. Integration means founding a shared entity
to scale the all similar data from the different databases.
• The data also required to be resided into various data warehouse in
shared and generally granted manner.
A data warehouse is built by integrating data from various sources of
data such that a mainframe and a relational database. In addition, it
must have reliable naming conventions, format and codes. Integration
of data warehouse benefits in effective analysis of data.
• Time-Variant – In this data is maintained via different intervals of time
such as weekly, monthly, or annually etc. It founds various time limit
which are structured between the large datasets and are held in
online transaction process (OLTP).
• The time limits for data warehouse is wide-ranged than that of
operational systems. The data resided in data warehouse is predictable
with a specific interval of time and delivers information from the
historical perspective.
• Non-Volatile – As the name defines the data resided in data warehouse is
permanent. It also means that data is not erased or deleted when new
data is inserted. It includes the mammoth quantity of data that is inserted
into modification between the selected quantity on logical business.
• It evaluates the analysis within the technologies of warehouse. Data is
not updated, once it is stored in the data warehouse, to maintain the
historical data.
• In this, data is read-only and refreshed at particular intervals. This is
beneficial in analysing historical data and in comprehension the
functionality. It does not need transaction process, recapture and
concurrency control mechanism.
Background
• A Database Management System (DBMS) stores data in the form of
tables, uses ER model. For example, a DBMS of college has tables for
students, faculty, etc.
• A Data Warehouse is separate from DBMS, it stores a huge amount of
data, which is typically collected from multiple heterogeneous sources
like files, DBMS, etc.
• The goal is to produce statistical results that may help in decision
makings. For example, a college might want to see quick different
results, like how the placement of CS students has improved over the
last 10 years, in terms of salaries, counts, etc.
Need for Data Warehouse
• An ordinary Database can store MBs to GBs of data and that too for a
specific purpose. For storing data of TB size, the storage shifted to
Data Warehouse.
• Besides this, a transactional database doesn’t offer itself to analytics.
• To effectively perform analytics, an organization keeps a central Data
Warehouse to closely study its business by organizing, understanding,
and using its historic data for taking strategic decisions and analyzing
trends.
Benefits of Data Warehouse
• Better business analytics: Data warehouse plays an important role in every
business to store and analysis of all the past data and records of the
company. which can further increase the understanding or analysis of data
to the company.
• Faster Queries: Data warehouse is designed to handle large queries that’s
why it runs queries faster than the database.
• Improved data Quality: In the data warehouse the data you gathered from
different sources is being stored and analyzed it does not interfere with or
add data by itself so your quality of data is maintained and if you get any
issue regarding data quality then the data warehouse team will solve this.
• Historical Insight: The warehouse stores all your historical data which
contains details about the business so that one can analyze it at any time
and extract insights from it
Front Room and Back Room in
MetaData
• Data warehouse metadata systems are sometimes separated into two
sections:
• back room metadata that are used for Extract, transform, load
functions to get OLTP data into a data warehouse
• front room metadata that are used to label screens and create
reports
• Meta-data Management involves storing information about other
information. With different types of media being used references to
the location of the data can allow management of diverse
repositories.
• Back Room: "Closer to the data..."
Related to programs, data models & databases
Related to ETL
Useful to DBAs, modelers, developers, programmers etc.

• Front Room: "Closer to the user"


Descriptive & informative - includes semantic details
Related to queries and reports
Useful to anyone who writes queries & reports

You might also like