Dwintro
Dwintro
august 1997
this white paper is adapted from the forthcoming book data warehousing with ms sql server.
er
e m o ry pow the skyrocketing power of
M hardware and software, along
Disk, d ease
CPU, w e r a n with the availability of
op Po affordable and easy-to-use
Deskt a nd eas
e reporting and analysis tools
Po w e r
Server
have played the most important
role in evolution of data
warehouses. figure 1
Hardw
are pr highlights the technological
Softw ices revolution that has greatly
are pr impacted data warehousing.
ices
1.3.2global corporation
the fall of communism and liberalization of asian and south american economies has changed the business climate worldwide
forever. competition from emerging economies has forced large corporations to become lean and efficient. the emergence of
this global economy has led to the migration of manufacturing to less expensive and less restrictive countries. former
communist and south american countries present very exciting and challenging business opportunities. along with these
opportunities they present a very volatile business climate and economies that are nearly impossible to predict.
businesses have not only focused on building products worldwide, but they have also changed their organization to sell
products around the globe. trade agreements such as nafta and eec greatly impact the decisions to enter markets or build
factories. this globalization of business has increased the need not merely for more continuous analysis, but also to manage
data in a centralized location. the process of rolling up manufacturing and sales data from far-flung business units has now
started to impact much larger number of corporations. businesses now need to continuously make the “build or buy”
decisions. globalization of business has made the consolidation of data in a central data warehouse more complicated.
factors such as currency fluctuations and product customization for different markets have added complexity to data
warehousing, making the analysis much more complicated. imagine trying to assess profitability of products built and sold in
multiple countries with volatile currencies. or, attempting to hedge the risks of downturn in economies that have been
expanding rapidly for extended periods.
section2:data
•Emergence of global economy
•Economic downturns in United States, Europe, Japan
warehousing
•Liberalization of Asian, South American, former Communist economies attributes and
•Compelling standard business applications concepts
•Technology savvy business analyst and technology aware management
in short, the separation of operational data from the analysis data is the most fundamental data warehousing concept. not only
is the data stored in a structured manner outside the operational system, businesses today are allocating considerable
resources to build data warehouses at the same time that the operational applications are deployed. rather than archiving data
to a tape as an afterthought of implementing an operational system, data warehousing systems have become the primary
interface for operational systems. figure 3 highlights the reasons for separation discussed in this section.
Future
Future
the data warehouse model needs to be extensible and structured such that the data from different applications can be added as
a business case can be made for the data. a data warehouse project in most cases cannot include data from all possible
applications right from the start. many of the successful data warehousing projects have taken an incremental approach to
adding data from the operational systems and aligning it with the existing data. they start with the objective of eventually
adding most if not all business data to the data warehouse. keeping this long-term objective in mind, they may begin with
one or two operational applications that provide the most fertile data for business analysis. figure 4 illustrates the extensible
architecture of the data warehouse.
•purchased applications: the application data structure may be dictated by an application that was purchased from a software
vendor and integrated into the business. the user of the application may have very little or no control over the data model.
some vendor applications have a very generic data model that is designed to accommodate a large number and types of
businesses.
•legacy application: the source application may be a very old mostly homegrown application where the data model has
evolved over the years. the database engine in this application may have been changed more than once without anyone
taking the time to fully exploit the features of the new engine. there are many legacy applications in existence today
where the data model is neither well documented nor understood by anyone currently supporting the application.
Order processing
Customer Product
orders price Data
Available Inventory Warehouse
Customers
Products
Product Price/inventory
Product Product Orders
price Inventory
Product Inventory
Product Price changes
Product Price
Marketing
Customer Product
Profile price
Marketing programs
figure 5 illustrates the alignment of data warehouse entities with the business structure. the data warehouse model breaks
away from the limitations of the source application data models and builds a flexible model that parallels the business
structure. this extensible data model is easy to understand by the business analysts as well as the managers.
Order processing
Customer Product
Data
Marketing
Customer Product
Profile price
Marketing programs
logical transformation concepts of source application data described here require considerable effort and they are a very
important early investment towards development of a successful data warehouse. figure 7 highlights the logical
transformation concepts discussed in this section.
Transformation
Operational -----------------------
Data Warehouse
System A cust, cust_id, borrower
>> customer ID System
-----------------------
Summarized Data
“1” >> “M”
“2” >> “F” Detailed
-----------------------
Operational
System B Missing >>> “……..” Data
figure 8 highlights the physical transformation concepts for data warehousing systems. physical transformation of source
application data requires considerable effort and it can be difficult at times, but a well-considered set of physical data
transformations can make a data warehouse more user-friendly. further, accurate and complete transformations help maintain
the integrity of the data warehouse.
Detailed
Perform business Data
analysis on detail data
summarization and predefined analysis of data in a data warehouse system is an important task. it is essential to maintain the
integrity of the summary views because a very large part of the data warehouse activity is against the summary views. figure
9 highlights the key concepts around summary views. the summary views need to be not only designed and built, they need
to be maintained as new data comes into the data warehouse.
2.5 definition
after considering the various attributes and concepts of data warehousing systems, a broad definition of a data warehouse can
be the following:
a data warehouse is a structured extensible environment designed for the analysis of non-volatile data, logically and
physically transformed from multiple source applications to align with business structure, updated and maintained
for a long time period, expressed in simple business terms, and summarized for quick analysis.
Data Warehouse
System
Predefined Queries against
reports and Summarized Data summary data
queries Detailed
Data
Data mining in
detail data
Other
Applications
figure 10 illustrates the analysis processes that run against a data warehouse. although a majority of the activity against
today’s data warehouses is simple reporting and analysis, the sophistication of analysis at the high end continues to increase
rapidly. of course, all analysis run at data warehouse is simpler and cheaper to run than through the old methods. this
simplicity continues to be a main attraction of data warehousing systems.
summary
this paper introduced the fundamental concepts of data warehousing. it is important to note that data warehousing is a
science that continues to evolve. many of the design and development concepts introduced here greatly influence the quality
of the analysis that is possible with data in the data warehouse. if invalid or corrupt data is allowed to get into the data
warehouse, the analysis done with this data is likely to be invalid.
after the rapid acceptance of data warehousing systems during past three years, there will continue to be many more
enhancements and adjustments to the data warehousing system model. further evolution of the hardware and software
technology will also continue to greatly influence the capabilities that are built into data warehouses.
data warehousing systems have become a key component of information technology architecture. a flexible enterprise data
warehouse strategy can yield significant benefits for a long period.