Chaudhary Devi Lal University, Sirsa: Submitted To: Submitted By: Poonam Bhatia M.Tech (PT) 2 Sem. Roll No. 14

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 18

Submitted To: Submitted By:

Poonam Bhatia
M.Tech(PT) 2nd Sem.
Roll No. 14

Chaudhary Devi Lal University,Sirsa


Contents
1. Introduction to Data Warehouse and Business Intelligence
2. Recent History
3.
Introduction to Business Intelligence and Data Warehouse

Business Intelligence refers to a set of methods and techniques that are used by organizations for
tactical and strategic decision making. It leverages technologies that focus on counts, statistics
and business objectives to improve business performance.

A Data Warehouse (DW) is simply a consolidation of data from a variety of sources that is
designed to support strategic and tactical decision making. Its main purpose is to provide a
coherent picture of the business at a point in time. Using various Data Warehousing toolsets,
users are able to run online queries and 'mine" their data.

Many successful companies have been investing large sums of money in business intelligence
and data warehousing tools and technologies. They believe that up-to-date, accurate and
integrated information about their supply chain, products and customers are critical for their very
survival.

This website introduces some key Data Warehousing concepts and terminology. It explains Data
Warehousing from a historical context and discusses the underlying business and technology
drivers that are making Data Warehouses a hot commodity. The site also provides guidance
regarding how you can proceed with this emerging technology.
Recent History

The need for improved business intelligence and data warehousing accelerated in the 1990s.
During this period, huge technological changes occurred and competition increased as a result of
free trade agreements, globalization, computerization and networking.

In the early 1990, the Internet took the world by storm. Companies rushed to develop eBusiness
and eCommerce applications with hopes of reducing their staffing needs and providing 24 hour
service to customers. The volume of application systems mushroomed during this period as a
parallel set of Internet applications was deployed. Back-end 'bridges' were built to try to
integrate the 'self service' application systems with the legacy 'full service' applications.
Unfortunately, integration was often messy and corporate data remained fragmented or
inconsistent.

As the demand for programmers increased and salaries climbed, businesses looked for
alternatives to custom built application systems. In hopes of reducing costs and remaining
competitive, companies purchased software packages from third parties. These packages were
designed for generic business requirements and often did not integrate well with the existing
legacy systems.

By the end of the millennium, businesses discovered that the number of application systems and
databases had multiplied, that their systems were poorly integrated and that their data was
inconsistent across the systems. More importantly, businesses discovered that they had lots of
fragmented data, but not the integrated information that was required for critical decision making
in a rapidly changing, competitive, global economy.

Companies began building Data Warehouses to consolidate data from disparate databases and to
better support their strategic and tactical decision making needs.
Use of Data Warehouse and Business Intelligence

Business Drivers:-

There are many business drivers in play today that are motivating companies to establish data
warehouses. Current, consistent and accurate business information, they believe is critical for
strategic and tactical decision making. Some of the business drivers are summarized below.

Single Version of the Truth

Fragmented, inconsistent and outdated data in multiple databases does not permit good strategic
and tactical decision making. Companies require that business intelligence be consolidated and
presented in a suitable format for decision making. Inconsistent information from disparate
information systems is no longer acceptable. Data Warehouses help companies to achieve a
single version of the truth by consolidating the most accurate and current data from the most
reliable systems.

Current and Accurate Information

In a highly competitive market place, businesses need to quickly identify problems and
opportunities and respond to events expeditiously and appropriately. Up-to-date information on
sales, profits, inventories and customers can help identify problems early and leverage
opportunities that could otherwise be missed. Most application systems are too narrowly scoped
and operate on cycles that don't support real-time or near real-time information access. A data
warehouse, however, can be designed to deliver up-to-date accurate information to decision
makers.

Rapidly Changing Information Needs

It is very difficult for businesses to anticipate future information needs. Application systems
often seem rigid and unable to adapt to evolving management information needs. Businesses
need the flexibility to slice and dice data in many ways in order to identify and analyze changes
in the market place or in the business itself. Data Warehouses are designed for online, analytical
purposes and provide great flexibility

Customer Service Excellence

It is often said that 10% of a business's customers account for 90% of the business's profits.
Identifying the good customers and providing them with excellent service helps retain good
customers. Data Warehousing can help identify a company's best customers using a any number
or criterion.
New Service Delivery Channels

It is no longer sufficient to provide customers with just 9:00 AM to 5:00 PM in-store service.
Customers want to do business 7 days a week, 24 hours per day using alternate service delivery
channels such as via the Internet or telephone. By examining all customer transactions,
regardless of the channel used, businesses can better understand their customers and serve them
better. Data Warehousing is critical for profiling customers and their transactions, regardless of
the channel used.

Technical Drivers:-

There are many technical drivers in play that are motivating companies to establish data
warehouses for online queries and analytics. These are summarized below.

Multiple Internal Databases

Most medium and large businesses operate dozens, if not hundreds of un-integrated application
systems. Individual departments in companies often focus on their own narrow system and
information needs and don’t see the corporate value of integrating data. When silos of un-
integrated data exist, data soon gets out of synch. Companies have a need for database that
reflects a "single version of truth". Data Warehouses can help do that.

Purchased Packages

“Out of the Box” purchased applications sometimes use underlying concepts and definitions that
differ from those used by the business in existing custom built applications. For example, a
“customer” in one system could encompass all current and past customers plus potential future
customers. In another system, a customer might be defined more narrowly as someone who has
purchased a product and service during the past 12 months. Such inconsistencies create problems
from an analytical perspective. A count of customers done in the first database differ from a
count done in the second. Companies have a need to align concepts and terminology. Data
Warehouses help do this alignment.

Increasing Complexity of Systems

The underlying data structures of application systems are often very complicated. To create
what would intuitively might appear to be a simple query often requires complex programming
logic that involves navigating multiple database tables and or applications systems. Writing
reports or queries can consequently take time and money. Companies have a need for a reporting
environment that allows reports and queries to be generated quickly, inexpensively and without
expensive IT skills. Data Warehouses can simplify the reporting environment.
Application System Evolution

Businesses are highly dynamic and applications systems are constantly needing to be enhanced
to support new business requirements. When systems are changed, reports and queries that
access any changed tables must also be updated. This maintenance work can be very costly.
Businesses have a need to trim their application support costs. Data Warehouses can help shelter
reports and queries from system changes that occur in "front end" operational systems.

Computer Networks and External Databases

The rapid growth of computer networks has allowed companies to exchange data with their
suppliers, consumers, government bodies and other groups. Businesses often have a need to
integrate data from internal and external databases. Data Warehouse can be designed to to
integrate corporate data with external data for reporting purposes.
Data Warehousing Tools:-

This portion discusses front-end tools that are available to transform data in a Data Warehouse
into actionable business intelligence.

The use of appropriate Data Warehousing tools can help ensure that the right information gets to
the right person via the right channel at the right time.

(i) Automated Alerts:-

Custom built and purchased application systems can be implemented to examine data in a Data
Warehouse and initiate system generated alerts when predefined thresholds are reached, or when
expected results are not attained. Alerts can be sent via an email, phone message or an electronic
workflow item to the appropriate decision maker. The rules for triggering automated alerts can
easily be adjusted as business requirements change.

By leveraging data in the Data Warehouse to identify business issues quickly and to provide
immediate notification to the appropriate decision maker, serious business problems can be
avoided.

(ii) Data Mining Tools:-

Data Mining Tools are analytical engines that use data in a Data Warehouse to discover
underlying correlations. Data Mining Tools are used by analysts to gain business intelligence by
identifying and observing trends, problems and anomalies.

Because the business environment is so dynamic, it is often difficult for businesses to quickly
identify emerging patterns or trends. Data Mining Tools help businesses identify problems and
opportunities promptly and then make quick and appropriate decisions with the new business
intelligence.

For example, with the help of a Data Mining tool, one large US retailer discovered that people
who purchase diapers often purchase beer. Upon analyzing the data, it was discovered that
young husbands are frequently asked by their spouses to pick up diapers after work, and those
husbands were also picking up other household necessities, at the same time.
(iii) Excel Spreadsheets:-

Excel Spreadsheets are frequently used in Data Warehousing applications to access and present
data from Data Marts. Spreadsheets are powerful, flexible and relatively inexpensive tools that
many decision makers are comfortable using.

Before Data Warehousing became popular, decision makers often had difficulty getting access to
corporate data. It was necessary to populate spreadsheets from multiple disparate data sources
and manually integrate the data. This process was both time consuming and error-prone.
Privacy, data redundancy and currency issues arose when decision makers retained their own
personal copies of sensitive corporate data on thepersonal computers and laptops.

In a Data Warehousing environment, a subset of the cleansed and integrated corporate data is
copied from the Data Warehouse to a Data Mart. The spreadsheet then accesses the Data Mart
directly. Where necessary, data from the Data Mart can be copied to a personal computer.

Excel and other spreadsheet applications provide Pivot Table capabilities that allow users to
separate "facts" (numeric data to be summed) from "dimensions" (used for filtering, sorting and
grouping).

Excel also provides graphing capabilities that permit the end user to present information in chart
and graph formats. These diagrams can be easily incorporated into MS Word documents,
PowerPoint presentations, web pages, etc.

The use of Excel Spreadsheets to present and analyze data from Data Warehouses is an
inexpensive and flexible method for sharing business intelligence.

(iv) OLAP Tools:-

The acronym OLAP stands for On-Line Analytical Processing. OLAP Tools are used to analyze
multi-dimensional data. These powerful tools allow users to identify observe trends and then to
"drill-down" to discover the details behind those trends.

As the name implies, OLAP tools are "online" and are used for "analytics". Many firms are
addressing their information needs by replacing their static, paper-based legacy reports with
online access to corporate information via OLAP Tools.
(v) Performance Dashboards:-

Performance Dashboards are "front-ends" to Data Warehouses that summarize, in graphical


format, how a business is performing against its measurable goals and objectives.

Performance Dashboards are targeted at senior decision makers who need to know at a glance,
how the business is performing. Performance Dashboards typically show historical trends and
organizational goals or targets.

The performance measures shown on Dashboards are based on the firm's key performance
indicators (KPIs). KPIs can involve financial, marketing, production, customer, growth and
other important metrics.
Architectural Overview:-

Data Warehouses can be architected in many different ways, depending on the specific needs of
a business. The model shown below is the "hub-and-spokes" Data Warehousing architecture that
is popular in many organizations.

In short, data is moved from databases used in operational systems into a data warehouse staging
area, then into a data warehouse and finally into a set of conformed data marts. Data is copied
from one database to another using a technology called ETL (Extract, Transform, Load).

Typical Data Warehousing Environment

Operational Systems:-

The principal reason why businesses need to create Data Warehouses is that their corporate data
assets are fragmented across multiple, disparate applications systems, running on different
technical platforms in different physical locations. This situation does not enable good decision
making.

When data redundancy exists in multiple databases, data quality often deteriorates. Poor
business intelligence results in poor strategic and tactical decision making.

Individual business units within an enterprise are designated as "owners" of operational


applications and databases. These "organizational silos" sometimes don't understand the
strategic importance of having well integrated, non-redundant corporate data. Consequently,
they frequently purchase or build operational systems that do not integrate well with existing
systems in the business.

Data Management issues have deteriorated in recent years as businesses deployed a parallel set
of ebusiness and ecommerce applications that don't integrate with existing "full service"
operational applications.

ETL Process:-

ETL Technology (shown below with arrows) is an important component of the Data
Warehousing Architecture. It is used to copy data from Operational Applications to the Data
Warehouse Staging Area, from the DW Staging Area into the Data Warehouse and finally from
the Data Warehouse into a set of conformed Data Marts that are accessible by decision makers.
The ETL software extracts data, transforms values of inconsistent data, cleanses "bad" data,
filters data and loads data into a target database. The scheduling of ETL jobs is critical. Should
there be a failure in one ETL job, the remaining ETL jobs must respond appropriately.

Data Staging Area:-

The Data Warehouse Staging Area is temporary location where data from source systems is
copied. A staging area is mainly required in a Data Warehousing Architecture for timing
reasons. In short, all required data must be available before data can be integrated into the Data
Warehouse.

Due to varying business cycles, data processing cycles, hardware and network resource
limitations and geographical factors, it is not feasible to extract all the data from all Operational
databases at exactly the same time.

For example, it might be reasonable to extract sales data on a daily basis, however, daily extracts
might not be suitable for financial data that requires a month-end reconciliation process.
Similarly, it might be feasible to extract "customer" data from a database in Singapore at noon
eastern standard time, but this would not be feasible for "customer" data in a Chicago database.

Data in the Data Warehouse can be either persistent (i.e. remains around for a long period) or
transient (i.e. only remains around temporarily).

Not all business require a Data Warehouse Staging Area. For many businesses it is feasible to
use ETL to copy data directly from operational databases into the Data Warehouse.

Data Warehouse:-

The purpose of the Data Warehouse in the overall Data Warehousing Architecture is to integrate
corporate data. It contains the "single version of truth" for the organization that has been
carefully constructed from data stored in disparate internal and external operational databases.
The amount of data in the Data Warehouse is massive. Data is stored at a very granular level of
detail. For example, every "sale" that has ever occurred in the organization is recorded and
related to dimensions of interest. This allows data to be sliced and diced, summed and grouped
in unimaginable ways.

Contrary to popular opinion, the Data Warehouses does not contain all the data in the
organization. Its purpose is to provide key business metrics that are needed by the organization
for strategic and tactical decision making.

Decision makers don't access the Data Warehouse directly. This is done through various front-
end Data Warehouse Tools that read data from subject specific Data Marts.

The Data Warehouse can be either "relational" or "dimensional". This depends on how the
business intends to use the information.

Data Marts:-

ETL (Extract Transform Load) jobs extract data from the Data Warehouse and populate one or
more Data Marts for use by groups of decision makers in the organizations. The Data Marts can
be Dimensional (Star Schemas) or relational, depending on how the information is to be used and
what "front end" Data Warehousing Tools will be used to present the information.

Each Data Mart can contain different combinations of tables, columns and rows from the
Enterprise Data Warehouse. For example, an business unit or user group that doesn't require a
lot of historical data might only need transactions from the current calendar year in the database.
The Personnel Department might need to see all details about employees, whereas data such as
"salary" or "home address" might not be appropriate for a Data Mart that focuses on Sales. Some
Data Mart might need to be refreshed from the Data Warehouse daily, whereas user groups
might want refreshes only monthly.

Web Analytics:-

This provides an overview of Web Analytics. It describes how Internet communications occur
and how those communications are logged. It explains how data from a web server's log can be
harvested to generate useful and actionable business intelligence, particularly when the data is
combined with existing customer and sales data in a Data Warehouse.

Web Analytics focuses on the interactions of customers with a company's website. It leverages
data from web server logs and corporate databases to obtain business intelligence on existing and
potential customers. Web Analytics is extremely powerful when data from web server logs is
integrated with data from customer and sales databases in a Data Warehouse.

Web Analytics provides business intelligence used to:

1. identify technical and navigation website issues


2. better understand the customer's unique needs, interests and patterns
3. identify improvements for website design.

The business intelligence that is gained can be used to design customized web sessions for
groups of users that are based on demographics common interests and similar behaviors. For
example, a web session can be customized based on the gender, language or geographic location
of the visitor.

Customizations to sessions can entail presenting visitors with web content and advertising that is
of direct relevance to them. By improving the user experience, businesses can increase sales and
visitors can be motivated to return to the site.

Internet Communication:-

Web Analytics uses communication data between a client and a server to generate business
intelligence of use to an organization. In brief, this is how it works:

When a request for a web page is initiated by a web browser on a personal computer to a web
server, identifying data is sent with the request. This information includes the physical address
of the personal computer on the WWW (i.e. IP address), the originating and destination URLs,
and the date and time of the event.

This data is required by the web servers in order to return web page content back to the web
browser on the the personal computer. This is illustrated graphically:

Communication between a PC (Client) and a Web Server


The web server retains all incoming communications in a log that is stored on the web server.
This data can be copied into a Data Warehouse and sliced and diced as needed to generate
business intelligence about website visits

In addition to transmitting basic Internet communication data, many websites store "cookies" on
the visitor's personal computers. These cookies contain identifying information about the
visitor's previous visit to the website.

Web Server Logs:-

web server log is the file(s) on a web server that contains a history of web page requests that
have occurred. The following data is available from server logs:

Data Description

IP Address The address of a device attached to an IP network. Every


client, server and network device must have a unique IP
address in order to communicate. Every communication
that occurs contains both a source IP address and a
destination IP address. The format of the IP address looks
like this:255.219.12.2

Date & Time Data and time that the communication request occurred

User Agent Name and version of the Web browser (e.g. Internet
Explorer)

Web Page URL of the web page being requested. The address can
contain parameters with session specific information (e.g.
form fields)

Referrer The URL of the previous webpage from which a link was
followed

By slicing and dicing this granular information, it is possible to compile the following type of
statistics:

1. Number of unique visits by day, week, month, year


2. Number or repeat visits by period
3. Number of page views by period
4. Most common entry pages
5. Most common exit pages
6. Most common navigation paths
7. Number of visits by high level domain (i.e. country)
8. Number of visits by originating IP address (PC, search engine, etc.)
9. Number of visits by search query used
10. Number of visits by browser type and version

User’s Authentication:-

There are two type of Internet Applications: authenticated and unauthenticated. An


authenticated application is one that requires a login ID and password. An unauthenticated
application allows visitors to use the Internet application without signing in. An authenticated
applications can generate much business intelligence about customers, their preferences, interests
and purchases.

To obtain a user ID and password, web users are normally required to register. They are
normally required to provide a valid email address, their name and address and some personal
profile information such as hobbies and interest. This data is stored in a database that can be
integrated with web usage data from web server logs.

Sometimes users aren't required to submit a registration over the Internet. Using data from their
internal corporate data, the business sends a login ID and password to their customers. Each
time the login ID is used, data is collected about the Internet session.

Whether or not a visitor is authenticated, most Internet Applications retain statistics on


application usage such as how the visitor navigated through the application. This is required to
identify technical issues and usage patterns.

By leveraging data available from authenticated applications, subsequent visits can be


customized for users.

Web Evolution:-

The Business Intelligence gained from analyzing web server logs and data from Internet
application systems can be leveraged to formulate plans for evolving your corporate website.
Shorter-than-expected visits, unexpected navigation paths and transaction abandonment prior to
completion can suggest design or usability issues that need to be addressed.

It is very useful to analyze the "entry" and "exit" pages to a website. Visitors frequently enter
through a 'back door" rather than through the home page. This isn't necessarily a problem. In
fact, website visits can often be increased by treating each web page as a "home page". This
involves implementing an appropriate navigation design and making full use of meta tags on all
pages.

Because web pages are dynamics, web analytics must account for this. The content on a given
web page can be much different from week to week.
Web analytics should also consider pages in a web as sets of visual "components" that have been
assembled into webpages. Some components may be used in multiple pages, while others are
not. Some components contain core content that all visitors see whereas other components might
only be visible tor certain categories of users.

The following diagram illustrates how a web pages can actually consist of many discrete
components:

Header
Customer Specific
N
a
v A
a d
Core
g v
a e
Content
t r
i t
o
n
Footer

When formulating a web analytics strategy, it is important to understand the architecture of a


web site and recognize and manage the evolving nature of corporate websites.

Dimensional Model:-

The following diagram is an example of a dimensional model that can be used to track web
access using data consolidated from a web server log, an authenticated or un-authenticated
Internet Application and a database containing customer information. By consolidating data in a
Data Warehouse, business intelligence about "customers" and un-authenticated visitors can be
gleaned.

The dimensions depicted in white represent data that is available from most web server logs.
Normally, the "device" is a personal computer connected to the Internet, however it can also be
an automated agent (e.g. robot). The "country" is the location where the "device" is physically
located. This is not always the location where the visitor is physically located. The
"organization" is normally the ISP that the visitor uses to connect to the Internet. The
"originating URL" is the web page that the visitor is hyper-linking from and the "destination
URL" is the web page that the visitor requests.
The dimensions shown in yellow represent additional information that is available when the
visitor initiates an Internet Application (e.g. e-business, e-commerce). That application system
must programmatically track the visitor through the application and record pages that are visited.

The "customer" dimension is only available when an application "authenticates" a visitor using a
login ID and password. Once identified, the customer can be uniquely identified regardless of
how or where he/she is accesses the Internet.

You might also like