0% found this document useful (0 votes)
210 views14 pages

Data Warehouse Administration

The document discusses key concepts related to data warehousing and analytical processing. It defines a data warehouse as a central repository that supports decision making by integrating data from different sources. A data mart is a subset of a data warehouse focused on a specific subject area. The document also describes the processes involved in data warehousing like extraction, transformation, loading and administration. It distinguishes analytical processing which involves complex queries from transaction processing which handles individual transactions.

Uploaded by

Joe Han
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
210 views14 pages

Data Warehouse Administration

The document discusses key concepts related to data warehousing and analytical processing. It defines a data warehouse as a central repository that supports decision making by integrating data from different sources. A data mart is a subset of a data warehouse focused on a specific subject area. The document also describes the processes involved in data warehousing like extraction, transformation, loading and administration. It distinguishes analytical processing which involves complex queries from transaction processing which handles individual transactions.

Uploaded by

Joe Han
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Warehouse

Administration
By Jhun Jhun M. Sanut
Data Warehouse

 Serves as a central repository of high-quality, consistent, and


reliable data that supports informed decision-making and strategic
planning within an organization.
 A data warehouse is designed to support business intelligence
activities by providing a consolidated view of data across different
domains and time periods. Data warehouses typically cover a wide
range of subject areas and store historical data.
 Provide a scalable and efficient solution for storing and retrieving
large volumes of data. It is typically built by extracting data from
different operational systems, transforming it into a consistent
format, and loading it into the data warehouse.
Data Mart

 Data mart is a subset of a data warehouse. It focuses on a


specific subject area or department within an organization.
 Data marts are designed to meet the specific analytical needs
of a particular user group, such as sales, marketing, or finance.
 They contain a subset of data from the data warehouse and are
often optimized for querying and reporting within their specific
domain.
Data Warehouse vs Data Mart

 Both data warehouses and data marts aim to support decision


support system (DSS) functions and provide a subject-oriented
approach to data management. However, data warehouses are
broader in scope, encompassing multiple subject areas and
integrating data from various sources, while data marts are
more specialized and tailored to specific user requirements.
 A data warehouse is a comprehensive repository of integrated
data across the organization, whereas a data mart is a focused
subset of a data warehouse catering to the needs of a specific
user group or department.
Data warehousing

 Data warehousing is the process of designing, implementing,


and maintaining a data warehouse.
 It involves collecting, organizing, and integrating data from
various sources to create a consolidated and unified view of an
organization's data.
Processes of Data Warehousing

1. Data Extraction: Data is extracted from different operational systems, such as


transactional databases, spreadsheets, and external sources. This involves identifying
the relevant data sources and extracting the necessary data in a structured format
2. Data Transformation:Extracted data is transformed and cleansed to ensure
consistency, quality, and compatibility. This includes tasks like data cleaning, data
integration, data validation, and data standardization. The transformed data is often
stored in a staging area before being loaded into the data warehouse.
3. Data Loading:The transformed data is loaded into the data warehouse, which involves
populating the data warehouse tables. This process can be done through various
methods, such as batch loading or real-time data integration. The data loading process
may include tasks like indexing, partitioning, and aggregating data for efficient storage
and retrieval.
Processes of Data Warehousing

4. Data Modeling: Data modeling involves designing the structure and relationships of
the data within the data warehouse. This includes defining dimensions, hierarchies,
measures, and relationships between different entities. Common data modeling
techniques used in data warehousing include star schema and snowflake schema.
5. Querying and Analysis: Once the data is loaded into the data warehouse, users can
query and analyze the data using business intelligence tools or SQL queries. Data
warehouse systems are optimized for complex queries and analytical processing,
allowing users to gain insights, generate reports, and perform advanced data analysis.
6. Maintenance and Administration: monitoring the data warehouse's performance,
managing security and access controls, ensuring data integrity and consistency, and
making updates or additions to the data warehouse schema as business needs evolve.
Transaction Processing
 Transaction processing involves handling individual, operational tasks or
transactions within a database. These transactions are typically short,
discrete operations that involve data manipulation, such as inserting,
updating, or deleting records. Transaction processing systems (TPS) are
designed to ensure data integrity, consistency, and concurrency control.
 ACID Properties: (Atomicity, Consistency, Isolation, Durability) This ensures
that transactions are treated as indivisible units, maintain data consistency,
are isolated from other transactions, and are durably stored.
 Transaction processing systems are optimized for high transaction throughput
and low response times. They are designed to handle a large number of
concurrent transactions efficiently.
 Concurrency control: ensure that multiple transactions can execute
concurrently without interfering with each other
Analytical Processing

 Analytical processing involves complex querying and analysis of large volumes of


data to derive insights and support decision-making. Analytical processing
systems are optimized for processing and aggregating large datasets efficiently.
 Complex queries and aggregations: involves executing complex queries that
involve joining multiple tables, filtering data, and performing aggregations, such
as grouping and summarizing data.
 Historical and summarized data: focuses on historical and summarized data
stored in a data warehouse or data mart. It enables trend analysis, data mining,
and business intelligence activities by providing a consolidated view of data from
multiple sources.
 Read-heavy workloads: optimized for read-heavy workloads where data is
accessed and analyzed frequently, but updates or inserts are less frequent.
Decision Support System (DSS)

 an information system that supports decision-making activities within


an organization. It typically involves analyzing data, generating
reports, and providing interactive tools to facilitate decision-making
processes. The goal of a DSS is to assist users in making informed and
effective decisions based on the available data and analytical
capabilities.
 the database used can be a combination of read-only and read-write
databases, depending on the nature of the decision-making tasks and
the data involved. The database may include operational data from
transactional systems, historical data from data warehouses or data
marts, as well as external data sources.
Key Characteristics Of A DSS Database
1. Data Integration: The DSS database integrates data from various sources to provide a
comprehensive view for decision-making. This may involve data extraction,
transformation, and loading processes to consolidate and cleanse data.
2. Analytical Processing: The database supports analytical processing capabilities,
enabling users to perform complex queries, generate reports, and conduct data analysis
to gain insights and support decision-making.
3. Data Manipulation: While a DSS primarily focuses on analysis, it may also involve
write operations for scenario planning, what-if analysis, and simulations. Users may
input data or modify existing data to explore different decision scenarios.
4. Performance Optimization: The database is designed to optimize query
performance, as DSS often deals with large datasets and complex queries. Indexing,
partitioning, and caching techniques may be employed to enhance query response
times.
5. Data Presentation: The database provides tools for presenting data in a meaningful
and user-friendly way, such as visualizations, dashboards, and interactive reports.
These aid in understanding and interpreting the analyzed data for decision-making
purposes.
Online Analytical Processing (OLAP)

 refers to a category of software tools and technologies that enable


users to perform complex analysis and interactive reporting on large
volumes of data in real-time. OLAP systems are specifically designed
for analytical processing and decision support.
 OLAP provides a multidimensional view of data, allowing users to
analyze information from different dimensions, such as time,
geography, product, and customer. It enables users to explore data
using various operations, including drill-down, roll-up, slice-and-dice,
and pivot, to gain insights and answer ad-hoc business questions.
Key Characteristics Of OLAP
 Multidimensional data model: OLAP systems use a multidimensional data model that
organizes data into dimensions, hierarchies, and measures. This allows users to
navigate and analyze data across different dimensions, facilitating flexible and
intuitive analysis.
 Aggregation: OLAP systems support pre-aggregation of data to improve query
performance. Aggregations summarize data at various levels of detail, allowing faster
retrieval of results for commonly used queries.
 Interactive analysis: OLAP systems provide a user-friendly interface that enables
interactive analysis. Users can drill down from summarized data to detailed data,
perform calculations, apply filters, and visualize data in charts and graphs.
 Real-time data: OLAP systems can support real-time or near-real-time data updates,
allowing users to analyze the latest information and make informed decisions.
 Scalability and performance: OLAP systems are designed to handle large volumes of
data and complex queries efficiently. They use optimization techniques like indexing,
caching, and parallel processing to deliver fast query response times.
Administering A Data Warehouse
 Involves managing and maintaining its operations to ensure its efficient functioning
and reliability. Here are some key aspects of administering a data warehouse:
1. Performance Monitoring
2. Data Security and Access Control
3. Backup and Recovery
4. Schema Evolution and Maintenance
5. Capacity Planning and Scalability
6. User Support and Training
7. Collaboration with Stakeholders
 Administering a data warehouse requires a proactive and holistic approach to
ensure its availability, performance, and usability. By addressing performance,
security, data quality, scalability, and user support, administrators can effectively
manage and optimize the data warehouse's operations to meet the organization's
analytical needs.

You might also like