TOPIC 1 - Intro N Trends To DW
TOPIC 1 - Intro N Trends To DW
TOPIC 1
TOPIC 1.1
INTRODUCTION AND
TRENDS TO DATA
TOPIC 1.2
WAREHOUSING
TOPIC 1.3
1.1 The need for data warehousing
1.2 Data Warehousing : The Building Blocks
1.3 Trends in Data Warehouse
EXERCISE
INTRO
As an enterprise grows larger, hundreds of computer
applications are needed to support various business
process.
Without these computer systems, no modern business
TOPIC 1.1
can survive. Companies started building and using
these systems in the 1960s and have become
completely dependent on them.
Computer system help to gather, store, and process all
TOPIC 1.2
the data needed to successfully perform the daily
operations. They provide online information and
produce a variety of reports to monitor and run the
business
TOPIC 1.3
Data warehouse is a new paradigm specially intended
to provide vital strategic information.
In the 1990s, organizations began to achieve
competitive advantage by building data warehouse
EXERCISE
systems.
WAREHOUSING
THE NEED FOR DATA
TOPIC 1.1
Understand the desperate need for strategic
information
Recognize the information crisis at every
enterprise
TOPIC 1.2
Distinguish between operational and
informational systems
Learn why all past attempts to provide strategic
information failed
TOPIC 1.3
Clearly see why data warehousing is the viable
solution
Understand business intelligence for enterprise
EXERCISE
DATA WAREHOUSING.
ORGANIZATIONS’ USE OF
INTRO
STRATEGIC INFORMATION
TOPIC 1.1
• For making decision, executives and managers
need information for the following purposes:
– To get in-depth knowledge of their company’s
TOPIC 1.2
operations
– Review and monitor key performance indicators and
note how these affect one another
– Keep track of how business factors change over time
TOPIC 1.3
– Compare their company’s performance relative to the
competition and to industry benchmarks
• Critical business decision depend on the availability
of proper strategic information in an enterprise
EXERCISE
ESCALATING NEED FOR
INTRO
STRATEGIC INFORMATION
TOPIC 1.1
Characteristic of strategic information
TOPIC 1.2
TOPIC 1.3
EXERCISE
ESCALATING NEED FOR
INTRO
STRATEGIC INFORMATION
TOPIC 1.1
The information Crisis
• Organization faced two major facts
– Organizations have lots of data
– Information technology resources and systems are not effective
TOPIC 1.2
at turning all that data into useful strategic information
• The large quantities of data are very useful and good for
running the business operation BUT hardly amendable for
use in making decision about business strategic and
TOPIC 1.3
objectives.
• Data needed for strategic decision making must be in a
format that is easily for analysis and allowing the
managers to review data from different business
EXERCISE
viewpoint.
ESCALATING NEED FOR
INTRO
STRATEGIC INFORMATION
Technology Trends
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
EXERCISE
ESCALATING NEED FOR
INTRO
STRATEGIC INFORMATION
TOPIC 1.1
Failures of Past Decision-Support Systems
• Most of the past decision-support system
failed since the users could not clearly define
what they want in the first place
TOPIC 1.2
• Information needed for strategic decision
making has to be available in an interactive
manner
TOPIC 1.3
• The user must be able to query online, get
results and query some more
• The information must be in a format suitable
EXERCISE
for analysis
ESCALATING NEED FOR
INTRO
STRATEGIC INFORMATION
Failures of Past Decision-Support Systems
TOPIC 1.1
• Most of the past decision-support system
failed since the users could not clearly define
what they want in the first place
TOPIC 1.2
• Information needed for strategic decision
making has to be available in an interactive
manner
TOPIC 1.3
• The user must be able to query online, get
results and query some more
• The information must be in a format suitable
EXERCISE
for analysis
INTRO
INABILITY TO PROVIDE INFORMATION
Factors relating to the inability to provide strategic information:
• IT receives too many ad hoc requests, resulting in a large overload.
TOPIC 1.1
With limited resources, IT is unable to respond to the numerous
requests in a timely fashion.
• Requests are not only too numerous, they also keep changing all the
time. The users need more reports to expand and understand the
TOPIC 1.2
earlier reports.
• The users find that they get into the spiral of asking for more and
more supplementary reports, so they sometimes adapt by asking for
every possible combination, which only increases the IT load even
further.
TOPIC 1.3
• The users have to depend on IT to provide the information. They are
not able to access the information themselves interactively.
• The information environment ideally suited for making strategic
decision making has to be very flexible and conducive for analysis. IT
has been unable to provide such an environment.
EXERCISE
INTRO
TOPIC 1.1
Decision Support Systems
TOPIC 1.2
Operational Systems
TOPIC 1.3
EXERCISE
OPERATIONAL vs.
INFORMATIONAL SYSTEMS
INTRO
Viable Solution
A new type of system Environment
TOPIC 1.1
• The desired features of the new type of system environment
are:
– Database designed for analytical tasks
TOPIC 1.2
– Data from multiple applications
– Easy to use and conducive to long interactive sessions by
users
– Read-intensive data usage
TOPIC 1.3
– Direct interaction with the system by the users without IT
assistance
– Content updated periodically and stable
– Content to include current and historical data
– Ability for users to run queries and get results online
EXERCISE
– Ability for users to initiate reports
Data Warehouse- The Only
INTRO
Viable Solution
Processing Requirements in the
TOPIC 1.1
New Environment
• Four levels of analytical processing requirements:
1. Running of simple queries and reports against
TOPIC 1.2
current and historical data
2. Ability to perform “what if ” analysis is many
different ways
TOPIC 1.3
3. Ability to query, step back, analyze, and then
continue the process to any desired length
4. Spot historical trends and apply them for future
results
EXERCISE
DATA WAREHOUSE
GENERAL OVERVIEW OF
INTRO
Data warehouse is an informational environment
TOPIC 1.1
that
• Provides an integrated and total view of the
enterprise
• Makes the enterprise’s current and historical
TOPIC 1.2
information easily available for decision making
• Makes decision-support transactions possible
without hindering operational systems
TOPIC 1.3
• Renders the organization’s information consistent
• Presents a flexible and interactive source of
strategic information
EXERCISE
DATA WAREHOUSE
INTRO
• An environment, Not a Product
– A data warehouse is not a single software or hardware
TOPIC 1.1
product you purchase to provide strategic information. It
is, rather, a computing environment where users can find
strategic information, an environment where users are
put directly in touch with the data they need to make
TOPIC 1.2
better decisions. It is a user-centric environment.
• A blend of many technologies
– Take all the data from the operational systems
– Where necessary, include relevant data from outside,
TOPIC 1.3
such as industry benchmark
– indicators
– Integrate all the data from the various sources
– Remove inconsistencies and transform the data
EXERCISE
– Store the data in formats suitable for easy access for
decision making
THE DATA WAREHOUSE:
INTRO
a blend of technologies
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
EXERCISE
EVOLUTION OF BUSINESS
INTRO
INTELLIGENCE
TOPIC 1.1
• Business Intelligence
– The system and technologies for
gathering, cleansing, consolidating and
TOPIC 1.2
storing corporate data
– Relates to the tools , techniques and
applications for analyzing the stored
TOPIC 1.3
data
– Composed of two environments:
• Data to Information
EXERCISE
• Information to Knowledge
BUILDING BLOCKS
DATA WAREHOUSING: THE
TOPIC 1.1
warehouse
• Discuss the defining features
• Distinguish between data warehouses and
TOPIC 1.2
data marts
• Review the evolved architectural types
TOPIC 1.3
• Study each components or building block
that makes up a data warehouse
• Introduce metadata and highlight its
significance
EXERCISE
INTRO
DATA WAREHOUSING
TOPIC 1.1
“A data warehouse is a subject oriented,
integrated, non volatile and time variant
collection of data in support of
TOPIC 1.2
management’s decision” (Bill Inmon, 1996)
TOPIC 1.3
available, integrated, time stamped, subject
oriented, nonvolatile and accessible” (Kelly)
EXERCISE
INTRO
DATA WAREHOUSING
TOPIC 1.1
“The process of transforming data in real-
time information for decision-making process.
It includes techniques, methodologies, or
TOPIC 1.2
tools for data storage into an electronic
repository. Time variant, volatile, and subject
oriented data are collected from multiple
TOPIC 1.3
sources and converted into homogeneous
data, which can be retrieved for analysis and
reports.”
EXERCISE
INTRO
FEATURES OF DATA
WAREHOUSING
TOPIC 1.1
• Subject-Oriented Data
• Integrated Data
TOPIC 1.2
• Time-Variant Data
• Nonvolatile Data
TOPIC 1.3
• Data Granularity
EXERCISE
INTRO
Subject-Oriented Data
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
EXERCISE
In a data warehouse, data is not stored by operational
applications, but by business subjects
Integrated Data
TOPIC 1.1
operational systems. Source data reside in
different database, files and data segments
• Data inconsistencies are removed, data from
TOPIC 1.2
diverse operational applications is transform,
consolidate and integrate
• The item that need to be standardized and
TOPIC 1.3
made consistent:
– Naming conventions, codes, data attributes and
measurements
EXERCISE
INTRO
Time-Variant Data
TOPIC 1.1
• A data warehouse contain historical
data, not just current values
• The time-variant nature of the data
TOPIC 1.2
in a data warehouse:
– Allow for analysis in the past
– Relates information to the present
TOPIC 1.3
– Enables forecasts for the future
EXERCISE
INTRO
Nonvolatile Data
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
EXERCISE
In a data warehouse, data is not updated or deleted
INTRO
Data Granularity
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
Data granularity refers to the level of the detail.
Depending on the requirements, multiple levels of
EXERCISE
detail may be present. Many data warehouses have
at least dual levels of granularity.
INTRO
BUILDING A DATA
WAREHOUSING
TOPIC 1.1
• Questions to be asked:
– Top-down or bottom-up approach?
– Enterprise-wide or departmental?
TOPIC 1.2
– Which first—data warehouse or data
mart?
– Build pilot or go with a full-fledged
TOPIC 1.3
implementation?
– Dependent or independent data marts?
EXERCISE
INTRO
DATA MARTS
a subset of corporate-wide data that
TOPIC 1.1
is of value to a specific groups of
users. Its scope is confined to
specific, selected groups, such as
TOPIC 1.2
marketing data mart.
Data Mart: A scaled-down version of the data
TOPIC 1.3
warehouse
A data mart is a small warehouse designed for the
department level.
EXERCISE
MARTS
DATA WAREHOUSE VS. DATA
INTRO
Approaches
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
EXERCISE
Figure Top Down versus Bottom Up Approaches
Top Down Approach
INTRO
Advantages Disadvantages
TOPIC 1.1
• A truly corporate effort, • Takes longer to build even
an enterprise view of with an iterative method
data • High exposure/risk to
• Inherently architected— failure
TOPIC 1.2
not a union of disparate • Needs high level of cross-
data marts functional skills
• Single, central storage of
• High outlay without proof
data about the content
of concept
TOPIC 1.3
• Centralized rules and
control
• May see quick results if
implemented with
iterations
EXERCISE
Bottom Up Approach
INTRO
Advantages Disadvantages
TOPIC 1.1
• Faster and easier • Each data mart has its own
implementation of narrow view of data
manageable pieces • Redundant data in every data
• Favorable return on mart
TOPIC 1.2
investment and proof of • Inconsistent and irreconcilable
concept data
• Less risk of failure • Produces unmanageable
• Inherently incremental; interfaces
TOPIC 1.3
can schedule important
data marts first
• Allows project team to
learn and grow
EXERCISE
A Practical Approach
INTRO
• Most people employ a Hybrid approach
TOPIC 1.1
with elements of Top-Down and Bottom-
Up
• Steps in practical approach are as follows:
TOPIC 1.2
1. Plan and define requirements at the overall
corporate level
2. Create a surrounding architecture for a
TOPIC 1.3
complete warehouse
3. Conform and standardize the data content
4. Implement the data warehouse as a series of
EXERCISE
supermarts, one at a time
INTRO
Data Warehouse Architectural Types
TOPIC 1.1
• Centralized
• Independent Data Marts
• Federated
TOPIC 1.2
• Hub-And-Spoke
• Data-Mart Bus
TOPIC 1.3
EXERCISE
INTRO
Data Warehouse Components
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
EXERCISE
Figure 2-7 Data Warehouse: Building blocks or components
INTRO
Data Warehouse Components
TOPIC 1.1
• Source Data Component
– Production Data.
– Internal Data.
TOPIC 1.2
– Archived Data.
– External Data.
• Data Staging Component
TOPIC 1.3
– Data Extraction
– Data Transformation.
EXERCISE
– Data Loading.
INTRO
Data Loading
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
EXERCISE
Figure 2-8 Data Movements to the Data Warehouse
INTRO
Data Storage Components
TOPIC 1.1
• Many of the data warehouses also employ
multidimensional database management
systems.
TOPIC 1.2
• Data extracted from the data warehouse
storage is aggregated in many ways and the
summary data is kept in the
TOPIC 1.3
multidimensional databases (MDDBs).
• Such multidimensional database systems
are usually proprietary products.
EXERCISE
INTRO
Information Delivery Components
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
EXERCISE
Figure 2-9 Information Delivery Component
INTRO
Metadata Components
TOPIC 1.1
• Metadata in a data warehouse is similar
to a data dictionary, but much more than
a data dictionary
TOPIC 1.2
• Types of Metadata:
– Operational Metadata
TOPIC 1.3
– Extraction and Transformation Metadata
– End-User Metadata
EXERCISE
INTRO
Why Meta Data: Special Significance
• First, it acts as the glue that connects
TOPIC 1.1
all parts of the data warehouse.
• Next, it provides information about
TOPIC 1.2
the contents and structures to the
developers.
• Finally, it opens the door to the end-
TOPIC 1.3
users and makes the contents
recognizable in their own terms.
EXERCISE
WAREHOUSE
TRENDS IN DATA
TOPIC 1.1
warehousing
• Learn how data warehousing has become
TOPIC 1.2
mainstream
• Discuss several major trends, one by one
• Grasp the need for standards and review
TOPIC 1.3
the progress
• Understand Web-enable data warehouse
EXERCISE
Continued Growth in Data Warehousing
INTRO
Data Warehousing is Becoming Mainstream:
TOPIC 1.1
4 significant factors drove many
companies to move into data
TOPIC 1.2
warehousing:
– Fierce competition
– Government deregulation
TOPIC 1.3
– Need to revamp internal process
– Imperative for customized marketing
EXERCISE
Data Warehouse Expansion
INTRO
• Earlier data warehouse concentrated
TOPIC 1.1
on keeping summary data for high-
level analysis, but NOW data
warehouses being built by different
TOPIC 1.2
business
• Now companies have the ability to
capture, cleanse, maintain, and use
TOPIC 1.3
the vast amounts of data generated
by their business transactions.
EXERCISE
Vendor Solutions and Products
INTRO
What must you do to take advantage of the trend in
TOPIC 1.1
your data warehouse?
• Real –Time Data • Data Fusion
Warehousing • Data Integration
• Multiple Data Types • Analytics
TOPIC 1.2
– Adding Unstructured Data. • Agent Technology
– Searching unstructured
• Syndicated Data
Data
–
• Active Data Warehousing
Spatial Data.
TOPIC 1.3
• Data Visualization
• Parallel Processing
• Data Warehouse Appliances
• Query Tools
EXERCISE
• Browser Tools
Real Time Data Warehousing
INTRO
• Real time data warehousing is dynamic , proving
TOPIC 1.1
the most up-to-date view of business in real
time. A real time data warehousing gets
refreshed continuously, with almost zero
TOPIC 1.2
latency.
• Real time information delivery increase
productivity tremendously by sharing
TOPIC 1.3
information with more people.
• However extraction, transformation, and
integration of data for real-time data warehouse
has several challenges.
EXERCISE
Multiple Data Type
INTRO
• Adding Unstructured Data
TOPIC 1.1
– Some vendors are addressing the inclusion of
unstructured data, especially text and images, by
treating such multimedia data as just another data type.
These are defined as part of the relational data and
TOPIC 1.2
stored as binary large objects (BLOBs) up to 2 GB in size.
User-defined functions (UDFs) are used to define these
as user-defined types (UDTs).
– Not all BLOBs can be stored simply as another relational
TOPIC 1.3
data type. For example, a video clip would require a
server supporting delivery of multiple streams of video
at a given rate and synchronization with the audio
portion. For this purpose, specialized servers are being
provided.
EXERCISE
Multiple Data Type
INTRO
TOPIC 1.1
• Searching Unstructured Data
– without the ability to search unstructured data,
integration of such data is of little value.
TOPIC 1.2
– Vendors are now providing new search engines to
find the information the user needs from
unstructured data. Query by image content is an
example of a search mechanism for images. The
TOPIC 1.3
product allows to pre-index images based on
shapes, colors, and textures. When more than
one image fits the search argument, the selected
images are displayed one after the other.
EXERCISE
Multiple Data Type
INTRO
• Spatial Data
TOPIC 1.1
– Adding spatial data will greatly enhance
the value of the data warehouse. Address,
street block, city quadrant, county, state,
TOPIC 1.2
and zone are examples of spatial data.
– Some database vendors are providing
spatial extenders to their products using
TOPIC 1.3
SQL extensions to bring spatial and
business data together.
EXERCISE
Data Visualization
INTRO
• Helps user the user to interpret query results quickly and easily
•
TOPIC 1.1
Major Visualization Trends
– More chart types
– Interactive Visualization
– Visualization of complex and large result sets
• Visualization Types – current visualization software now supports a large array
of chart types ( pie, bar chart, scatter plots & constellation graphs)
TOPIC 1.2
• Advanced Visualization Techniques
– Chart Manipulation
– Drill Down
– Advanced Interaction
• Dashboards and Scorecards
TOPIC 1.3
– Dashboards : monitor and measures processes. A dashboard provides real time
information that warn users with alerts or extension conditions.
– Scorecards : track progress compared to objectives. It display periodic snapshots of
performance viewed against an organization’s strategic objectives and targets.
EXERCISE
Parallel Processing
INTRO
• Parallel Processing Hardware Options.
TOPIC 1.1
– multiple CPUs, memory modules, one or more server
nodes, and high-speed communication links between
interconnected nodes.
• Parallel Processing Software Implementation.
TOPIC 1.2
– Parallel processing software must be capable of performing
the following steps:
• Analyzing a large task to identify independent units that can be
executed in parallel
TOPIC 1.3
• Identifying which of the smaller units must be executed one after
the other
• Executing the independent units in parallel and the dependent
units in the proper sequence
• Collecting, collating, and consolidating the results returned by the
EXERCISE
smaller units
Parallel Processing
INTRO
Advantages when adopting parallel processing in data
TOPIC 1.1
warehouse:
• Performance improvement for query processing,
data loading, and index creation
TOPIC 1.2
• Scalability, allowing the addition of CPUs and
memory modules without any changes to the
existing application
• Fault tolerance so that the database would be
TOPIC 1.3
available even when some of the parallel
processors fail
• Single logical view of the database even though the
EXERCISE
data may reside on the disks of multiple nodes
Data Warehouse Appliance
INTRO
TOPIC 1.1
• Data warehouse appliance is designed
specifically to take care the workload
of business intelligence
TOPIC 1.2
• It integrates hardware, software,
storage and DBMS into one unified
device
TOPIC 1.3
• For administrator, a data warehouse
appliance provides simplicity because
EXERCISE
of its integrated nature
Query Tools
INTRO
Flexible presentation
TOPIC 1.1
Aggregate awareness
TOPIC 1.2
Crossing subject areas
TOPIC 1.3
Integration
EXERCISE
Browser Tools
INTRO
Some recent trends in enhancements to browser tools:
TOPIC 1.1
• Tools are extensible to allow definition of any type of
data or informational object
• Open APIs (application program interfaces) are included
• Several types of browsing functions including
TOPIC 1.2
navigation through hierarchical groupings
• Users able to browse the catalog (data dictionary or
metadata), find an informational object of interest, and
proceed further to launch the appropriate query tool
TOPIC 1.3
with the relevant parameters
• Applying Web browsing and search techniques to
browse through the information catalogs
EXERCISE
Data Fusion
INTRO
TOPIC 1.1
• Data fusion is a technology dealing
with the merging of data from
disparate sources.
TOPIC 1.2
• It has a wider scope and includes real-
time merging of data from
instruments and monitoring systems
TOPIC 1.3
EXERCISE
Data Integration
INTRO
TOPIC 1.1
4 different levels in an
TOPIC 1.2
information system
• Data Integration
• Application Integration
• Business Process Integration
TOPIC 1.3
• User Interaction Intergration
EXERCISE
Analytics
INTRO
Two area of analysis:
TOPIC 1.1
• Multidimensional Analysis
• Be able to analyze business measurements in
many different ways
TOPIC 1.2
• OLAP
• Predictive Analytics
• Assist the organizations by improving their
understanding of their customer behavior by
TOPIC 1.3
optimizing their business processes, by
enabling them to anticipate problems before
they arise and by helping them to recognize
opportunities well ahead of time
EXERCISE
Agent Technology
INTRO
• A software agent : a program that is
TOPIC 1.1
capable of performing a predefined
programmable task on behalf of the
user
TOPIC 1.2
• In data warehouse : software agents is
use to alert users to predefined
TOPIC 1.3
business conditions.
EXERCISE
Data Warehouse and ERP
INTRO
Management (KM)
TOPIC 1.1
TOPIC 1.2
TOPIC 1.3
EXERCISE
Knowledge Management (KM) : a systematic process for capturing,
integrating,
organizing and communicating knowledge.
Others significance trends..
INTRO
TOPIC 1.1
• Data warehouse and CRM
• Agile Development
• Active Data Warehousing
TOPIC 1.2
TOPIC 1.3
EXERCISE
Emergence of Standards
INTRO
In each of the multitude of technologies supporting
TOPIC 1.1
the data warehouse, numerous vendors and products
exist. The implication is that when building a data
warehouse, many choices are available to create an
effective solution with the best-of-breed products.
TOPIC 1.2
• Metadata : Two separate bodies are working on the
standards for metadata
– Meta Data Coalition.: Microsoft ++
TOPIC 1.3
– The Object Management Group. : Oracle, IBM, Hewlett-
Packard, Sun, and Unisys sought
• OLAP : The OLAP Council was established in January
1995.
EXERCISE
Emergence of Standards
INTRO
In each of the multitude of technologies supporting
TOPIC 1.1
the data warehouse, numerous vendors and products
exist. The implication is that when building a data
warehouse, many choices are available to create an
effective solution with the best-of-breed products.
TOPIC 1.2
• Metadata : Two separate bodies are working on the
standards for metadata
– Meta Data Coalition.: Microsoft ++
TOPIC 1.3
– The Object Management Group. : Oracle, IBM, Hewlett-
Packard, Sun, and Unisys sought
• OLAP : The OLAP Council was established in January
1995.
EXERCISE
Web-Enable Data Warehouse
INTRO
TOPIC 1.1
• The Warehouse to the Web
• The Web to the Warehouse (Webhouse)
– The Webhouse can produce the following useful information:
• Site statistics
TOPIC 1.2
• Visitor conversions
• Ad metrics
• Referring partner links
• Site navigation resulting in orders
TOPIC 1.3
• Site navigation not resulting in orders
• Pages that are session killers
• Relationships between customer profiles and page activities
• Best customer and worst customer analysis
• The Web-Enabled Configuration
EXERCISE
INTRO
DW 2.0
TOPIC 1.1
The advantages of the DW 2.0 architecture include the ability to:
TOPIC 1.2
Not cost huge amounts of money,
TOPIC 1.3
Link structured data and unstructured data,
EXERCISE
EXERCISE