0% found this document useful (0 votes)
21 views53 pages

Data Warehousing and DSS

Uploaded by

Poorva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views53 pages

Data Warehousing and DSS

Uploaded by

Poorva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Data Warehousing &

Decision Support
Systems
UNIT II
What is Data Warehousing?
• Data warehousing is the process of collecting, storing, and managing
large volumes of data from various sources in a centralized repository.
• This repository, known as a data warehouse, is designed to support
business intelligence (BI) activities, such as reporting, analysis, and
decision-making.
• Data warehousing is a critical component of modern business
intelligence strategies, helping organizations to efficiently manage,
analyze, and leverage their data for better decision-making.
Key Features of Data
Warehousing:
• Centralized Repository: Data from different sources (e.g., databases, spreadsheets,
external data) is consolidated into a single location, making it easier to manage and
analyze.
• Historical Data Storage: Data warehouses typically store historical data, allowing
organizations to analyze trends over time.
• Data Integration: Data from various sources is often inconsistent in format or structure.
The data warehousing process involves transforming this data into a consistent format.
• Query and Analysis: Data warehouses are optimized for complex queries and data
analysis, rather than just transaction processing. This enables efficient and fast retrieval
of data for reports and analysis.
• Support for Decision-Making: By providing access to integrated and historical data,
data warehouses help organizations make informed decisions.
Components of a Data
Warehouse:
• ETL Process (Extract, Transform, Load): The process of extracting data
from various sources, transforming it into a consistent format, and
loading it into the data warehouse.
• Data Storage: The actual storage of data in the warehouse, usually in a
format optimized for query performance.
• Metadata: Information about the data stored in the warehouse, such as
data definitions, mappings, and relationships, which helps in
understanding and managing the data.
• Data Access Tools: Tools used by business analysts and decision-makers
to access and analyze the data, such as SQL queries, reporting tools, and
dashboards.
Benefits of Data Warehousing:
• Improved Data Quality and Consistency: By consolidating data from
multiple sources, data warehouses ensure that data is accurate,
complete, and consistent.
• Enhanced Business Intelligence: Data warehouses provide a
foundation for advanced data analysis and reporting, enabling
organizations to gain deeper insights.
• Scalability: Data warehouses are designed to handle large volumes of
data, making them suitable for growing organizations.
Designing and Developing Data
Warehouses
• Designing and developing a data warehouse involves a series of
structured steps that ensure the data warehouse meets the specific
needs of the organization. Below is an overview of the key phases in
the process:
Key Phases of designing and
developing a Data Warehouse
• 1. Requirements Gathering
• Identify Business Requirements: Understand the goals of the data
warehouse by engaging with stakeholders to determine the types of
reports, analyses, and business insights they need.
• Define Data Sources: Identify all the data sources (e.g., databases,
CRM systems, external data feeds) that will feed into the data
warehouse.
2. Data Modeling

• Conceptual Data Model: Create a high-level model that outlines the


major entities, relationships, and key data elements.
• Logical Data Model: Develop a more detailed model, often using ER
(Entity-Relationship) diagrams, to define how data elements relate to
each other without considering the physical storage.
• Physical Data Model: Define how the data will be physically stored in
the database, including table structures, indexing strategies,
partitioning, and storage optimizations.
3. Architecture Design
• Data Warehouse Architecture: Choose the architecture type (e.g.,
star schema, snowflake schema, data vault) based on the specific
needs and complexities of the data.
• ETL (Extract, Transform, Load) Design: Design the ETL processes that
will extract data from source systems, transform it into a suitable
format, and load it into the data warehouse.
• Data Storage and Management: Decide on the storage technologies
(e.g., relational databases, columnar storage, cloud storage) and
design strategies for data partitioning, indexing, and data retention.
4. ETL Process Development
• Data Extraction: Develop scripts or use ETL tools to extract data from
the identified sources. Ensure the extraction process is efficient and
minimizes the impact on source systems.
• Data Transformation: Implement transformation rules to clean,
normalize, aggregate, and format the data to meet the requirements
of the data warehouse schema.
• Data Loading: Load the transformed data into the data warehouse,
often implementing processes for incremental loading, data
validation, and error handling.
5. Data Warehouse
Implementation
• Database Setup: Set up the physical database environment according
to the physical data model, including creating tables, indexes, and
partitions.
• ETL Job Scheduling: Schedule ETL jobs to run at appropriate times
(e.g., nightly, hourly) to ensure the data warehouse is updated with
fresh data.
• Data Integration and Testing: Perform integration testing to ensure
data from different sources is correctly combined, and validate that
the data warehouse meets all functional and non-functional
requirements.
6. Data Access and Reporting
Layer
• User Access Management: Implement security measures to control
access to the data warehouse, including user roles and permissions.
• Query and Reporting Tools: Integrate with business intelligence tools
(e.g., Power BI, Tableau, SQL queries) to allow users to generate
reports, dashboards, and perform ad-hoc analysis.
• Performance Optimization: Continuously monitor and optimize query
performance, including refining indexes, partitioning strategies, and
query execution plans.
7. Maintenance and Evolution
• Data Quality Management: Implement ongoing processes to monitor
and ensure data quality, including data validation checks and anomaly
detection.
• Performance Monitoring: Continuously monitor system performance
and scalability, making adjustments as data volumes grow.
• Change Management: Manage changes to the data warehouse, such
as adding new data sources, modifying the schema, or updating ETL
processes.
8. Documentation and Training
• Documentation: Create detailed documentation covering data
models, ETL processes, data flows, and access protocols.
• User Training: Provide training sessions and materials for end-users
and administrators to ensure they can effectively use the data
warehouse and its associated tools.
9. Evaluation and Continuous
Improvement
• User Feedback: Regularly gather feedback from end-users to identify
areas for improvement.
• Iterative Enhancements: Based on feedback and changing business
needs, make iterative enhancements to the data warehouse to ensure
it continues to meet organizational goals.
Tools and Technologies:
• ETL Tools: Informatica, Talend, Apache NiFi, Microsoft SQL Server
Integration Services (SSIS)
• Data Warehousing Platforms: Amazon Redshift, Google BigQuery,
Snowflake, Microsoft Azure Synapse, Teradata
• BI Tools: Tableau, Power BI, Looker, QlikView
• Database Management Systems: Oracle, Microsoft SQL Server,
PostgreSQL, MySQL
Extract Transform Load
Process in Data
Warehousing
ETL Process
ETL (Extract, Transform, Load)
• The Extract, Transform, Load (ETL) process is a critical component of
data warehousing, enabling the movement and transformation of
data from various sources into a centralized data warehouse.
• ETL process is fundamental to ensuring that data is accurately,
efficiently, and reliably moved from various sources into the data
warehouse, ready for analysis and decision-making.
• ETL is broken down into phases
1. Extract (E)
• Purpose: The extraction phase involves collecting data from various source systems. These
sources can include databases, flat files, web services, APIs, CRM systems, and more.
• Key Steps:
• Identify Data Sources: Determine where the data resides, such as relational databases (e.g., Oracle,
SQL Server), NoSQL databases, cloud-based storage, or third-party APIs.
• Data Extraction: Extract the relevant data from the source systems. This might involve querying
databases, reading flat files, or pulling data from APIs.
• Data Filtering: During extraction, some data might be filtered out if it’s irrelevant to the data
warehouse’s goals. This helps in reducing the data volume and focusing on valuable data.
• Challenges:
• Data Consistency: Ensuring that data remains consistent across different sources.
• System Load: Minimizing the impact on source systems, especially during peak business hours.
• Data Synchronization: Ensuring that data extracted reflects the most current state of the source
systems.
2. Transform (T)
• Purpose: Transformation is the process of converting the extracted data into a format that is suitable for
analysis and storage in the data warehouse. This stage involves data cleansing, normalization, aggregation,
and enrichment.
• Key Steps:
• Data Cleaning: Correct errors in the data (e.g., removing duplicates, correcting inconsistencies, handling missing values)
to ensure quality.
• Data Mapping: Align data fields from different sources to a unified format, ensuring consistency in data types and
structures.
• Data Aggregation: Summarize detailed data to higher levels, such as calculating totals, averages, or counts, often based
on business logic.
• Data Normalization/Denormalization: Standardize data to ensure uniformity across different datasets, or denormalize it
to optimize query performance.
• Business Logic Implementation: Apply specific business rules to the data to transform it into a format useful for analysis.
• Challenges:
• Complex Transformations: Handling complex data transformations that require deep understanding of business rules.
• Data Quality: Ensuring that transformations do not introduce errors or distort data.
• Performance: Optimizing transformations to handle large volumes of data efficiently.
3. Load (L)
• Purpose: The load phase involves loading the transformed data into the final target, usually the data
warehouse. The data can be loaded in full or incrementally, depending on the system’s requirements.
• Key Steps:
• Full Load: Loading the entire dataset into the data warehouse, typically done during the initial load or when major
changes occur.
• Incremental Load: Loading only the new or updated data since the last load. This method is more efficient and is often
used in ongoing ETL processes.
• Data Validation: Ensure that the data has been accurately loaded into the warehouse, with integrity checks to confirm
that all data is accounted for.
• Indexing and Partitioning: After loading, the data might be indexed and partitioned to improve query performance and
manageability.
• Challenges:
• Data Volume: Managing large volumes of data during the load process, ensuring that the data warehouse can handle
the load efficiently.
• Downtime Management: Minimizing downtime during the load process, especially in environments where data is
continuously updated.
• Data Consistency: Ensuring that data is consistently loaded and that there are no discrepancies between the source and
target systems.
ETL Tools and Technologies:
• Informatica: A powerful ETL tool known for its broad connectivity and
advanced transformation capabilities.
• Talend: An open-source ETL tool that offers robust data integration and
transformation features.
• Apache NiFi: A data integration tool that supports real-time data flow
management and transformation.
• Microsoft SQL Server Integration Services (SSIS): A platform for data
integration and workflow applications, commonly used in the Microsoft
ecosystem.
• Pentaho Data Integration (Kettle): Another open-source ETL tool that offers
a user-friendly interface and supports a wide range of data sources.
Best Practices for ETL
Processes:
• Start with Clear Requirements: Clearly define the business requirements
before designing the ETL process.
• Data Quality Management: Implement robust data quality checks at
every stage to ensure accuracy.
• Performance Optimization: Optimize the ETL process for performance by
using techniques like parallel processing, indexing, and efficient querying.
• Monitor and Log: Implement logging and monitoring to track the
performance of ETL processes and quickly identify any issues.
• Scalability Consideration: Design the ETL process to be scalable,
accommodating increasing data volumes over time.
Decision Support
Systems and Business
Intelligence
Decision Support Systems (DSS) and Business Intelligence (BI) are closely
related concepts in the realm of data-driven decision-making. Both play
vital roles in helping organizations analyze data, generate insights, and
make informed business decisions.
Decision Support Systems (DSS)
• Definition:
• A Decision Support System (DSS) is a computerized system that
supports the process of decision-making in an organization. It is
designed to assist managers and decision-makers by providing them
with relevant information, analytical tools, and models to solve
complex problems and make decisions more effectively.
Key Components of DSS:
• Data Management Component:
• Involves collecting and storing data from internal and external sources.
• Data is often organized in a database or data warehouse that the DSS can query.
• Model Management Component:
• Contains mathematical and analytical models that help in decision-making.
• These models can include statistical models, simulation models, optimization models, and what-if
analysis tools.
• User Interface (UI):
• The interface through which users interact with the DSS.
• It is designed to be user-friendly, allowing non-technical users to perform complex analyses and
generate reports.
• Knowledge Management Component (optional):
• Integrates knowledge-based systems or expert systems that provide expert advice or
recommendations.
Types of DSS:
• Data-Driven DSS: Focuses on the retrieval and analysis of data, often using a
data warehouse or online analytical processing (OLAP) tools.
• Model-Driven DSS: Relies on complex models to process data and provide
decision-making support, such as financial planning systems or supply chain
management systems.
• Knowledge-Driven DSS: Provides specialized problem-solving expertise,
often through the integration of expert systems.
• Document-Driven DSS: Manages, retrieves, and manipulates unstructured
information in a variety of electronic formats.
• Communication-Driven DSS: Supports more than one person working on a
shared task, often through collaboration and communication tools.
Applications of DSS:
• Financial Planning: Helping organizations forecast financial outcomes
based on different scenarios.
• Supply Chain Management: Optimizing inventory levels, logistics, and
supplier relationships.
• Healthcare: Assisting in the diagnosis and treatment planning by
analyzing patient data.
• Customer Relationship Management (CRM): Analyzing customer data
to improve service, retention, and marketing strategies.
Business Intelligence (BI)
• Definition:
• Business Intelligence (BI) refers to the technologies, applications, and
practices for the collection, integration, analysis, and presentation of
business information. The goal of BI is to support better business
decision-making by transforming raw data into actionable insights.
Business Intelligence (BI)
• Business intelligence (BI) is a set of technologies and strategies that businesses use to analyze and manage
their data. BI tools use artificial intelligence (AI) and machine learning to collect, analyze, and present business
data in a way that helps businesses make informed decisions.
• BI can help businesses:
• Identify trends
• BI can help businesses quickly identify patterns and trends in their data. For example, retail companies can use
BI to analyze sales data and identify bestsellers and underperformers.
• Resolve problems
• BI can help businesses identify problems and bottlenecks, and streamline processes to eliminate them.
• Grow revenue
• BI can help businesses make better, faster decisions that can lead to increased revenue. For example, service
companies can use BI to understand customer preferences and improve their experiences, which can increase
loyalty.
• Plan for the future
• BI can help businesses predict future trends by using statistical models and forecasting methods.
Business Intelligence (BI)
BI typically involves four steps:
1. Data collection
Raw data is collected from the business's activities.
2. Analysis
The data is processed and stored in data warehouses, where users can access it to
start analyzing it. Modern BI tools can automate analysis, which can save time and
effort.
3. Visualization
The data is transformed into insights that are easy for everyone in the organization
to use.
4. Decision-making
The insights are used to make decisions and track performance against goals.
Key Components of BI
• Data Sources:
• Internal Data: Information from internal business systems like ERP, CRM, and HR systems.
• External Data: Data from outside the organization, such as market data, competitor analysis, and social media.
• Data Warehousing:
• A central repository where data from different sources is stored and managed. This forms the backbone of any BI system.
• ETL (Extract, Transform, Load):
• The process of extracting data from various sources, transforming it into a consistent format, and loading it into the data
warehouse.
• Data Analysis and Reporting:
• OLAP (Online Analytical Processing): Allows for the multidimensional analysis of data, enabling complex queries and
analyses.
• Data Mining: Identifies patterns, correlations, and trends in large datasets.
• Reporting Tools: Tools like dashboards and scorecards provide visual representations of data to help users understand
trends and performance metrics.
• Data Visualization:
• The graphical representation of data through charts, graphs, maps, and dashboards. It helps users quickly grasp complex
information and insights.
Applications of BI
• Performance Management: Monitoring and managing the
performance of an organization using key performance indicators
(KPIs).
• Sales Analysis: Analyzing sales data to understand trends, customer
preferences, and market demands.
• Marketing Analysis: Assessing the effectiveness of marketing
campaigns and strategies.
• Risk Management: Identifying, assessing, and mitigating risks in
various areas of the business.
Benefits of BI
• Improved Decision-Making: Provides accurate and timely information
to decision-makers, enabling more informed decisions.
• Increased Efficiency: Automates the process of data collection and
analysis, reducing the time and effort required to generate insights.
• Competitive Advantage: By leveraging data, organizations can identify
market trends and opportunities faster than competitors.
• Enhanced Customer Insights: BI tools help in understanding customer
behavior and preferences, leading to better customer service and
retention.
Relationship Between DSS and
BI
• Complementary Roles: DSS and BI are complementary; while DSS
focuses on providing decision-making tools and models to solve
specific problems, BI focuses on data gathering, integration, and
analysis to provide insights that inform broader business decisions.
• Integration: Modern BI tools often include DSS capabilities, allowing
organizations to use a single platform to both analyze data and make
informed decisions. Similarly, a DSS may utilize BI data to enhance its
decision-making models.
Example Scenario
• Consider a retail chain that wants to optimize its inventory
management. The organization can use BI to analyze sales data,
identify trends, and forecast demand. This data, when integrated into
a DSS, can help the management make decisions about inventory
levels, reorder points, and supplier selection based on different
scenarios, such as seasonal demand spikes or supplier delays.
Summary
• DSS and BI are essential tools in modern business environments,
providing the information and analytical capabilities needed to make
data-driven decisions that enhance business performance.
DSS concepts,
methods and
technologies
Decision Support Systems (DSS) encompass a broad range of concepts,
methods, and technologies designed to aid decision-making in
organizations. Below is a detailed overview of these elements
DSS Concepts
• Support for Decision-Making:
• Objective: The primary goal of a DSS is to improve the effectiveness of decision-making by providing
relevant data, models, and analysis tools to decision-makers.
• Types of Decisions: DSS supports various types of decisions, including strategic, tactical, and operational.
These can range from unstructured (complex, non-routine) to structured (routine, well-defined) decisions.
• Human-Computer Interaction (HCI):
• User-Centric Design: DSS is designed to enhance the interaction between the user and the system. The
user interface must be intuitive and facilitate easy access to data and models.
• Feedback Loop: Effective DSS includes mechanisms for feedback, allowing users to refine inputs, review
outcomes, and iterate on decisions.
• Customization and Flexibility:
• Tailored Solutions: DSS can be customized to meet the specific needs of an organization or individual user,
supporting different types of analysis, reporting, and decision-making processes.
• Adaptability: A good DSS is adaptable to changes in the business environment, such as new data sources,
business rules, or external factors.
DSS Methods
• Model-Driven Methods:
• Optimization Models: These involve finding the best solution from a set of
alternatives, such as linear programming models used for resource allocation.
• Simulation Models: Used to replicate complex systems and analyze different
scenarios, often applied in logistics, manufacturing, and risk management.
• What-If Analysis: Allows decision-makers to explore the outcomes of different
scenarios by adjusting variables in the models.
• Forecasting Models: These use historical data and statistical methods to
predict future trends, commonly used in sales, finance, and supply chain
management.
DSS Methods
• Data-Driven Methods:
• Data Mining: Involves extracting patterns and knowledge from large
datasets, often used for customer segmentation, fraud detection, and
market analysis.
• OLAP (Online Analytical Processing): Enables multidimensional
analysis of data, allowing users to slice and dice data along different
dimensions, such as time, geography, and product categories.
• Data Visualization: Techniques such as dashboards, heatmaps, and
charts help in presenting complex data in an understandable format,
aiding quick decision-making.
DSS Methods
• Knowledge-Driven Methods:
• Expert Systems: Use a knowledge base and inference engine to mimic
human decision-making, providing advice or recommendations in
specific domains like medical diagnosis or financial planning.
• Case-Based Reasoning: Solves new problems by adapting solutions
that were used to solve similar past problems, useful in areas like legal
reasoning or customer service.
DSS Methods
• Group Decision Support Methods:
• Collaborative Tools: Support group decision-making processes
through shared data access, brainstorming tools, voting mechanisms,
and discussion forums.
• Delphi Method: Involves gathering expert opinions in multiple rounds
to converge on a decision, often used in strategic planning and
forecasting.
DSS Technologies
• Data Warehousing and ETL:
• Data Warehouses: Central repositories that store and manage data from multiple sources, structured for query and analysis
rather than transaction processing.
• ETL (Extract, Transform, Load): Tools and processes that extract data from different sources, transform it into a consistent
format, and load it into the data warehouse.
• Database Management Systems (DBMS):
• Relational Databases: Use tables to store data in a structured format, allowing for complex queries using SQL (Structured
Query Language).
• NoSQL Databases: Designed for unstructured or semi-structured data, offering flexibility and scalability, often used in big
data applications.
• Analytical Tools:
• OLAP Tools: Support multidimensional analysis, enabling users to perform complex queries and drill down into data.
• Statistical Analysis Tools: Software like SAS, SPSS, or R, used for advanced statistical modeling and analysis.
• Artificial Intelligence (AI) and Machine Learning (ML):
• AI Techniques: Include natural language processing, pattern recognition, and neural networks, enhancing DSS capabilities in
areas like predictive analytics and automated decision-making.
• Machine Learning Models: Algorithms that learn from data to make predictions or decisions without explicit programming,
applicable in fraud detection, recommendation systems, and more.
DSS Technologies
• Business Intelligence (BI) Tools:
• Dashboards: Provide a visual summary of key performance indicators (KPIs) and metrics, enabling quick assessment of
business health.
• Reporting Tools: Generate detailed reports based on queries, often customizable to meet specific user needs.
• Cloud Computing and SaaS (Software as a Service):
• Cloud-Based DSS: Leverage cloud platforms to provide scalable, on-demand DSS capabilities, reducing the need for on-
premises infrastructure.
• SaaS Solutions: Offer DSS functionality as a service, enabling easy access and deployment, often with subscription-based
pricing.
• Big Data Technologies:
• Hadoop and Spark: Frameworks for processing and analyzing large datasets in distributed computing environments, often
used in conjunction with DSS for big data analytics.
• Data Lakes: Storage repositories that hold vast amounts of raw data in its native format until needed for analysis.
• Web-Based DSS:
• Web Portals: Provide users with access to DSS tools and data through web interfaces, often supporting remote and mobile
access.
• APIs and Integration: Allow DSS to integrate with other systems and data sources, enhancing its capabilities and reach.
.
Emerging Trends in DSS
• Real-Time DSS: Systems that provide decision support in real-time,
often using streaming data and IoT (Internet of Things) devices.
• Cognitive DSS: Integrate AI and machine learning to create systems
that can learn and adapt, providing more nuanced and context-aware
decision support.
• Mobile DSS: Increasing use of mobile devices for accessing DSS tools,
making decision support available anytime and anywhere.
• Self-Service BI/DSS: Tools that empower non-technical users to create
their own reports and analyses, reducing dependency on IT
departments.
Applications of DSS
• Healthcare: Assisting in diagnosis, treatment planning, and resource
allocation.
• Finance: Portfolio management, risk assessment, and fraud detection.
• Manufacturing: Production scheduling, inventory management, and
quality control.
• Retail: Customer segmentation, demand forecasting, and sales
analysis.
• Public Sector: Urban planning, disaster management, and policy
analysis.
In Summary
• DSS encompasses a wide array of concepts, methods, and
technologies designed to support decision-making across various
domains.
• By integrating data, analytical models, and user-friendly interfaces,
DSS provides organizations with the tools they need to make
informed, effective decisions.
Further Reading
• https://redbook.cs.berkeley.edu/redbook3/lec28.html
• Data Warehousing, Decision Support & OLAP

• https://www.ibm.com/docs/en/informix-servers/12.10?topic=databases-overview-data
-warehousing
• Overview of data warehousing

• https://www.youtube.com/watch?v=qB0vspslPn4
• How I helped this brand with a simple dashboard

• https://www.youtube.com/watch?v=YSriO71a4Ac
• Case Study: Clinical Decision Support Systems

You might also like