Data Warehouse - Introduction
Data Warehouse - Introduction
com
LTI-Mindtree@GL_DE-C2-206
Data Warehouse - Introduction
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss the following topics:
● What is a Data Warehouse?
● Applications of Data Warehouse
● Evolution of Data Warehousing
● Data Warehouses vs. Data Marts
● Operational Data Stores
[email protected]
LTI-Mindtree@GL_DE-C2-206
● Warehouse Components
● Data Warehouse Architectures
● Data Staging
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
What is a Data Warehouse?
● Data Warehouse is a technology used to manage large amounts of data in a centralized repository.
● They are designed for querying and analysis and are optimized for read-only access.
● They typically contain historical data that has been extracted from various sources and transformed to
fit in a consistent data model.
● They provide a single source of truth for an organization’s data.
● They help business users to make data-driven decisions.
[email protected]
LTI-Mindtree@GL_DE-C2-206
● A well-designed data warehouse helps organizations identify opportunities and challenges and guide
strategic planning and resource allocation.
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Applications of a Data Warehouse
● Business Intelligence: A single, integrated view of data for decision making.
● Trend Analysis: Detecting patterns and trends in historical data for forecasting.
● Customer Relationship Management (CRM): 360-degree view of customers, providing a better
understanding of customer behavior and preferences.
● Risk Management: Identify and monitor risks, including operational risks, credit risks, and market risks.
● Regulatory Compliance: Monitor and report on regulatory compliance, such as financial reporting,
[email protected]
LTI-Mindtree@GL_DE-C2-206
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Evolution of a Data Warehouse
• 1970s – Edgar Codd proposed a set of rules called the “relational model,” which gave rise to relational
databases making it easier to store and manage a large amount of data.
• Mid-1980s – One of the earliest examples of a data warehouse is built by IBM for a supermarket chain.
• Early 1990s – Traditional operational and transactional databases did not satisfy the requirements for
data analysis, as they were designed and optimized to support daily business operations with a primary
focus on concurrency, recovery, and consistency. The concept of data warehousing has become more
[email protected]
LTI-Mindtree@GL_DE-C2-206
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Evolution of a Data Warehouse
• Late 1990s: The emergence of data mining and online analytical processing (OLAP) allows businesses to
extract insights from their data more easily.
• Early 2000s: The rise of the internet and e-commerce led to the development of web-based data
warehousing solutions.
• Mid-2000s: The advent of big data and the rise of cloud computing leads to new challenges and
opportunities for storing and analyzing a large amount of data.
[email protected]
LTI-Mindtree@GL_DE-C2-206
• 2010s: The rise of machine learning and artificial intelligence leads to the development of new data
warehousing tools and techniques for processing and analyzing large data sets.
• Present: Data warehousing continues to evolve. New technologies and approaches are emerging to help
business store and analyze their data more effectively.
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Data Warehouse vs Data Marts
Data Warehouse Data Marts
● Contains all the data of the business ● Contains a subset of data (typically
organization (all the business units) in data warehouse) in separate
one single centralized repository repositories relevant to a specific
● Large-scale, enterprise-wide
[email protected] business unit
LTI-Mindtree@GL_DE-C2-206
● Optimised for complex queries and ● Smaller-scale
analysis ● Optimised for fast access and quick
● In the long run, having a data decision-making
warehouse can help ensure the ● Data marts can exist without a data
consistency and accuracy of data warehouse.
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Operational Data Stores (ODS)
● Definition: An ODS is a type of centralized data repository that is used to support “operational”
business processes to make real-time data-driven decisions for business operations.
● Examples of Operational processes are order/inventory/customer management.
● It is designed to integrate data, which might not be fully cleansed or transformed, from multiple
sources such as transactional databases, messaging systems, and external data feeds.
● Organisations can have multiple ODS, and it is possible for an ODS to have two or more business
[email protected]
LTI-Mindtree@GL_DE-C2-206
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Warehouse Components
● Source Systems: Operational systems that collect and store transactional data.
● ETL: Process of extracting data from source systems, transforming it into a suitable format for the data
warehouse, and loading it into the data warehouse.
● Datawarehouse Database: Central repository that holds historical and aggregated data.
● Metadata: Provides information about the data stored inside the data warehouse (data definitions,
data lineage, data quality).
[email protected]
LTI-Mindtree@GL_DE-C2-206
● Access tools: Software tools used to access, query, analyze and report on the data inside the data
warehouse.
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Data Warehouse Architecture
● A data warehouse architecture refers to the overall design and structure of a data warehouse system.
● An effective data warehouse architecture provides a framework for organizing and integrating data
from disparate sources into a single, consistent, and easily accessible repository for business analysis
and decision making.
● It includes the underlying data models, data storage, data integration processes, and data access
methods required to support the analytics and reporting needs of the organization.
[email protected]
LTI-Mindtree@GL_DE-C2-206
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Data Warehouse Architecture - Layers
● Data Layer: Data is extracted from various sources, transformed into a suitable format, and loaded into
the Data Layer.
● Semantic Layer: It is the middle tier, where online analytical processing (OLAP) and online transactional
processing (OLTP) servers restructure the data for quicker execution of complex queries.
● Analytics layer: The top tier is client facing, holds the data warehouse access tools that let users
interact with data, create dashboards, KPIs monitoring and reporting, data mining, and more.
[email protected]
LTI-Mindtree@GL_DE-C2-206
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Data Warehouse Architecture - Layers
Data Collection
[email protected]
LTI-Mindtree@GL_DE-C2-206
Data Layer Semantics Layer Analytics Layer
ETL process collects Contains Databases, Data is restructured Users can make
and transforms the Data Mart. Metadata for fast, complex queries and analysis for
data is created queries and analytics business decisions
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Data Staging
● Data staging involves moving data from its original source into a “staging area”.
● Here, data can be extracted, transformed, cleansed, and organized into staging (temporary) tables.
● The staging area acts as a buffer between the source systems and the data warehouse.
● This approach allows data to be properly prepared before being loaded into the data warehouse.
● The process helps maintain the data integrity and minimize errors in the data warehouse.
[email protected]
LTI-Mindtree@GL_DE-C2-206
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Summary
● Data warehousing has evolved from simple data storage to an important business intelligence tool that
helps organizations make data-driven decisions.
● Data marts are small, departmental data warehouses designed for specific business units, while data
warehouses are centralized repositories that store all enterprise data.
● An operational data store (ODS) is a real-time database that serves as a staging area for operational
systems data before it is loaded into the data warehouse.
[email protected]
LTI-Mindtree@GL_DE-C2-206
● The components of a data warehouse include the source system, ETL process, data warehouse
database, and BI tools.
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Summary
● The data warehouse architecture includes various layers, such as staging, integration, storage, and
presentation layers, and various technologies, such as RDBMS, OLAP, and data mining tools.
● Data staging is the process of collecting and preparing data from various sources to be loaded into the
data warehouse and includes tasks such as data extraction, data transformation, and data loading.
[email protected]
LTI-Mindtree@GL_DE-C2-206
This content.
Proprietary file is meant
©Great for personal
Learning. use Reserved.
All Rights by [email protected]
Unauthorized use or distributiononly.
prohibited
Sharing or publishing the contents in part or full is liable for legal action.