0% found this document useful (0 votes)
13 views25 pages

DWM CHP1 Notes

Notes

Uploaded by

sr5824241
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views25 pages

DWM CHP1 Notes

Notes

Uploaded by

sr5824241
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

UNIT 1: INTRODUCTION TO DATA WAREHOUSING

WHAT IS DATA WAREHOUSE?


 Data warehouse is the process of constructing and using a data warehouse.
The data warehouse is a basis for information processing.
 A data warehouse is a collection of data specific to the entire organization.
 A data warehouse is a collection of technologies aimed at enabling the
knowledge worker (executive, manager, and analyst) to make better and
faster decision.
 A Data Warehouse (DW) is a relational database that is designed for query
and analysis rather than transaction processing. It includes historical data
derived from transaction data from single and multiple sources.
 A Data Warehouse provides integrated, enterprise-wide, historical data and
focuses on providing support for decision-makers for data modelling and
analysis.
WHAT IS DATA MINING?
 Data Mining refers to the extraction of useful information from bulk of data
or data warehouses.
 Data Mining is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics, and database systems.

DATAWAREHOUSE WITH MINING TECHNIQUES 1

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

CHARACTERISTICS OF DATA WAREHOUSE:


Subject Oriented:

A data warehouse is subject oriented because it gives information around a


subject somewhat than the organization's ongoing operations. These subjects
contain product, clients, suppliers, sales, customers etc. Data warehouses are
designed to help you analyse data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales.
Using this warehouse, you can answer questions like "Who was our best
customer for this item last year?" This ability to define a data warehouse by
subject matter, sales in this case makes the data warehouse subject oriented.
Integrated:

DATAWAREHOUSE WITH MINING TECHNIQUES 2

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

A data warehouse is built by integrating data from different sources such as


relation database files, etc. Integration improves the effective analysis of data.
Integration is closely related to subject orientation. Data warehouses must put
data from disparate sources into a consistent format. They must resolve such
problems as naming conflicts and inconsistencies among units of measure.
When they achieve this, they are said to be integrated.
Time Variant:
The data collected in a data warehouse is already identified with a particular
time period. Data warehouse provides data from historical point of view.
In order to discover trends in business, analysts need large amounts of data.
This is very much in contrast to online transaction processing (OLTP) systems,
where performance requirements demand, that historical data be moved to an
archive. A data warehouse's focus on change over time is what is meant by the
term time variant. Typically, data flows from one or more online transaction
processing (OLTP) databases into a data warehouse on a monthly, weekly, or
daily basis. The data is normally processed in a staging file before being added
to the data warehouse. Data warehouses commonly range in size from tens of
gigabytes to a few terabytes. Usually, the vast majority of the data is stored in a
few very large fact tables.
Non-Volatile:

When new data is added to previous data, old data is not deleted it means
nonvolatile. A data warehouse is keep separated from the operational database
& hence changes made in operational database are not reflected in the data
warehouse.

DATAWAREHOUSE WITH MINING TECHNIQUES 3

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

Non-volatile means that, once entered into the warehouse, data should not
change. This is logical because the purpose of a warehouse is to enable you to
analyse what has occurred

Separate:
The DW is separate from the operational systems in the company. It gets its
data out of these legacy systems
Available:
The task of a DW is to make data accessible for the user.
Aggregation performance:
The data which is requested by the user has to perform well on all scales of
aggregation.
Consistency:
Structural and contents of the data is very important and can only be guaranteed
by the use of metadata: this is independent from the source and collection date
of the data.

DIFFERENCE BETWEEN OPERATIONAL DATABASE AND DATA


WAREHOUSE
OPERATIONAL DATABASE DATA WAREHOUSE
1. Operational database systems are 1. Data warehousing systems are
designed to support high-volume typically designed to support high-
transaction processing. volume analytical processing (i.e.,
OLAP).
2. Operational database systems are 2. Data Warehousing Systems are
usually concerned with current usually concerned with historical
data. data.
3. Data within operational systems 3. Non-volatile, new data may be
are mainly updated regularly added regularly. Once, the data
according to need. added rarely changed.

DATAWAREHOUSE WITH MINING TECHNIQUES 4

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

4. It is designed for real-time 4. It is designed for analysis of


business dealing and processes. business measures by subject area,
categories, and attributes.
5. It is optimized for a simple set of 5. It is optimized for extent loads and
transactions, generally adding or high, complex, unpredictable
retrieving a single row as per queries that access many rows per
timetable. table.
6. It is optimized for validation of 6. Loaded with consistent, valid
incoming information during information, requires no real time
transactions, uses validation data- validation.
tables.
7. It supports thousands of 7. It supports a few concurrent clients
concurrent clients. relative to OLTP.
8. Operational database systems are 8. Data warehousing subjects are
widely functional or process widely subject-oriented.
oriented.
9. Operational systems are usually 9. Data warehousing systems are
optimized to perform fast inserts usually optimized to perform fast
and updates of associatively small retrievals of relatively high
volumes of data. volumes of data.
10.Operational database system 10.Data warehousing system focuses
focuses on Data in. on data out.
11.Less number of data accessed. 11.Large number of data accessed.
12.Relational databases are created 12.Data warehouse designed for
for OLTP. OLAP.
13.Data integration in operational 13.Data integration in data warehouse
database is application based. is subject based.
14.It provides detailed and flat 14.It provides summarized and
relational view of data. multidimensional view of data.

WHAT IS THE NEED OF DATA WAREHOUSE?


1) Advanced query processing:
In most businesses, even the best database systems are bound to either a
single server or a handful of servers in a cluster. A data warehouse is a
purpose-built hardware solution far more advanced than standard database

DATAWAREHOUSE WITH MINING TECHNIQUES 5

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

servers. What this means is a data warehouse will process queries much
faster and more effectively, leading to efficiency and increased productivity.
2) Better consistency of data:
Developers work with data warehousing systems after data has been
received so that all the information contained in the data warehouse is
standardized. Only uniform data can be used efficiently for successful
comparisons. Other solutions simply cannot match a data warehouse's level
of consistency.
3) Improved user access:
A standard database can be read and manipulated by programs like SQL
Query Studio or the Oracle client, but there is considerable ramp up time for
end users to effectively use these apps to get what they need. Business
intelligence and data warehouse end-user access tools are built specifically
for the purposes data warehouses are used: analysis, benchmarking,
prediction and more.
4) All-in-one:
A data warehouse has the ability to receive data from many different
sources, meaning any system in a business can contribute its data. Let's face
it: different business segments use different applications. Only a proper data
warehouse solution can receive data from all of them and give a business the
"big picture" view that is needed to analyse the business, make plans, track
competitors and more.
5) Future-proof:
A data warehouse doesn't care where it gets its data from. It can work with
any raw information and developers can "massage" any data it may have
trouble with. Considering this, you can see that a data warehouse will
outlast other changes in the business' technology. For example, a business
can overhaul its accounting system, choose a whole new CRM solution or
change the applications it uses to gather statistics on the market and it won't
matter at all to the data warehouse. Upgrading or overhauling apps
anywhere in the enterprise will not require subsequent expenditures to
change the data warehouse side.
6) Retention of data history:
End-user applications typically don't have the ability, not to mention the
space, to maintain much transaction history and keep track of multiple
changes to data. Data warehousing solutions have the ability to track all

DATAWAREHOUSE WITH MINING TECHNIQUES 6

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

alterations to data, providing a reliable history of all changes, additions and


deletions. With a data warehouse, the integrity of data is ensured.
7) Disaster recovery implications:
A data warehouse system offers a great deal of security when it comes to
disaster recovery. Since data from disparate systems is all sent to a data
warehouse, that data warehouse essentially acts as another information
backup source. Considering the data warehouse will also be backed up,
that's now four places where the same information will be stored: the
original source, its backup, the data warehouse and its subsequent backup.
This is unparalleled information security.

COMPONENETS OF DATAWAREHOUSE:

The figure shows the essential elements of a typical warehouse. We see the
Source Data component shows on the left. The Data staging element serves as
the next building block. In the middle, we see the Data Storage component that
handles the data warehouses data. This element not only stores and manages the
data; it also keeps track of data using the metadata repository. The Information
Delivery component shows on the right consists of all the different ways of
making the information from the data warehouses available to the users.
Source Data Component
Source data coming into the data warehouses may be grouped into four broad
categories:
DATAWAREHOUSE WITH MINING TECHNIQUES 7

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

Production Data:
This type of data comes from the different operating systems of the enterprise.
Based on the data requirements in the data warehouse, we choose segments of
the data from the various operational modes.
Internal Data:
In each organization, the client keeps their "private" spreadsheets, reports,
customer profiles, and sometimes even department databases. This is the
internal data, part of which could be useful in a data warehouse.
Archived Data:
Operational systems are mainly intended to run the current business. In every
operational system, we periodically take the old data and store it in achieved
files.
External Data:
Most executives depend on information from external sources for a large
percentage of the information they use. They use statistics associating to their
industry produced by the external department.

Data Staging Component


After we have been extracted data from various operational systems and
external sources, we have to prepare the files for storing in the data warehouse.
The extracted data coming from several different sources need to be changed,
converted, and made ready in a format that is relevant to be saved for querying
and analysis.
We will now discuss the three primary functions that take place in the staging
area.
1) Data Extraction:
This method has to deal with numerous data sources. We have to employ the
appropriate techniques for each data source.
2) Data Transformation:

DATAWAREHOUSE WITH MINING TECHNIQUES 8

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

As we know, data for a data warehouse comes from many different sources.
If data extraction for a data warehouse posture big challenge, data
transformation presents even significant challenges. We perform several
individual tasks as part of data transformation. First, we clean the data
extracted from each source. Cleaning may be the correction of misspellings
or may deal with providing default values for missing data elements, or
elimination of duplicates when we bring in the same data from various source
systems. Standardization of data components forms a large part of data
transformation. Data transformation contains many forms of combining
pieces of data from different sources. We combine data from single source
record or related data parts from many source records. On the other hand,
data transformation also contains purging source data that is not useful and
separating outsource records into new combinations. Sorting and merging of
data take place on a large scale in the data staging area. When the data
transformation function ends, we have a collection of integrated data that is
cleaned, standardized, and summarized.

3) Data Loading:
Two distinct categories of tasks form data loading functions. When we
complete the structure and construction of the data warehouse and go live for
the first time, we do the initial loading of the information into the data
warehouse storage. The initial load moves high volumes of data using up a
substantial amount of time.
Data Storage Components
Data storage for the data warehousing is a split repository. The data repositories
for the operational systems generally include only the current data. Also, these
data repositories include the data structured in highly normalized for fast and
efficient processing.
Information Delivery Component
The information delivery element is used to enable the process of subscribing
for data warehouse files and having it transferred to one or more destinations
according to some customer-specified scheduling algorithm.

DATAWAREHOUSE WITH MINING TECHNIQUES 9

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

Metadata Component
Metadata in a data warehouse is equal to the data dictionary or the data
catalogue in a database management system. In the data dictionary, we keep the
data about the logical data structures, the data about the records and addresses,
the information about the indexes, and so on.
Data Marts
It includes a subset of corporate-wide data that is of value to a specific group of
users. The scope is confined to particular selected subjects. Data in a data
warehouse should be a fairly current, but not mainly up to the minute, although
development in the data warehouse industry has made standard and incremental
data dumps more achievable. Data marts are lower than data warehouses and
usually contain organization. The current trends in data warehousing are to
developed a data warehouse with several smaller related data marts for
particular kinds of queries and reports.
Management and Control Component
The management and control elements coordinate the services and functions
within the data warehouse. These components control the data transformation
and the data transfer into the data warehouse storage. On the other hand, it
moderates the data delivery to the clients. Its work with the database
management systems and authorizes data to be correctly saved in the
repositories. It monitors the movement of information into the staging method
and from there into the data warehouses storage itself.

DATAWAREHOUSE WITH MINING TECHNIQUES 10

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

ARCHITECTURE OF DATAWAREHOUSE: -
There is general 3 types of architecture in DW, which are:
1) Single tier architecture
2) Two-tier architecture
3) Three tier architecture or multi-tier architecture

Single-Tier architecture
Single-Tier architecture is not periodically used in practice. Its purpose is to
minimize the amount of data stored to reach this goal; it removes data
redundancies. The figure shows the only layer physically available is the
source layer. In this method, data warehouses are virtual. This means that the
data warehouse is implemented as a multidimensional view of operational data
created by specific middleware, or an intermediate processing layer.

OR

DATAWAREHOUSE WITH MINING TECHNIQUES 11

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

The vulnerability of this architecture lies in its failure to meet the requirement
for separation between analytical and transactional processing. Analysis queries
are agreed to operational data after the middleware interprets them. In this way,
queries affect transactional workloads.

Two-Tier architecture:
The requirement for separation plays an essential role in defining the two- tier
architecture for a data warehouse system, as shown in fig:

Although it is typically called two-layer architecture to highlight a separation


between physically available sources and data warehouses, in fact, consists of
four subsequent data flow stages:
1. Source layer:
A data warehouse system uses a heterogeneous source of data. That data
is stored initially to corporate relational databases or legacy databases,
or it may come from an information system outside the corporate walls.
2. Data Staging:
The data stored to the source should be extracted, cleansed to remove
inconsistencies and fill gaps, and integrated to merge heterogeneous
sources into one standard schema. The so- named Extraction,
Transformation, and Loading Tools (ETL) can combine heterogeneous
schemata, extract, transform, cleanse, validate, filter, and load source
data into a data warehouse.

DATAWAREHOUSE WITH MINING TECHNIQUES 12

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

3. Data Warehouse layer:


Information is saved to one logically centralized individual repository: a
data warehouse. The data warehouses can be directly accessed, but it
can also be used as a source for creating data marts, which partially
replicate data warehouse contents and are designed for specific
enterprise departments. Meta-data repositories store information on
sources, access procedures, data staging, users, data mart schema, and
so on.
4. Analysis:
In this layer, integrated data is efficiently, and flexible accessed to issue
reports, dynamically analyse information, and simulate hypothetical
business scenarios. It should feature aggregate information navigators,
complex query optimizers, and customer-friendly GUIs

Three-Tier Architecture:
The three-tier architecture consists of the source layer (containing multiple
source system), the reconciled layer and the data warehouse layer (containing
both data warehouses and data marts). The reconciled layer sits between the
source data and data warehouse. The main advantage of the reconciled layer is
that it creates a standard reference data model for a whole enterprise. At the
same time, it separates the problems of source data extraction and integration
from those of data warehouse population. In some cases, the reconciled layer is
also directly used to accomplish better some operational tasks, such as
producing daily reports that cannot be satisfactorily prepared using the
corporate applications or generating data flows to feed external processes
periodically to benefit from cleaning and integration. This architecture is
especially useful for the extensive, enterprise-wide systems. A disadvantage of
this structure is the extra file storage space used through the extra redundant
reconciled layer. It also makes the analytical tools a little further away from
being real-time.

DATAWAREHOUSE WITH MINING TECHNIQUES 13

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

OR

Data Warehouses usually have a three-level (tier) architecture that includes:


1. Bottom Tier (Data Warehouse Server)
2. Middle Tier (OLAP Server)
3. Top Tier (Front end Tools).
A bottom-tier that consists of the Data Warehouse server, which is almost
always an RDBMS. It may include several specialized data marts and a
metadata repository. Data from operational databases and external sources
(such as user profile data provided by external consultants) are extracted using
application program interfaces called a gateway. A gatewayis provided by the
underlying DBMS and allows customer programs to generate SQL code to be
executed at a server. Examples of gateways contain ODBC (Open Database
Connection) and OLE-DB (Open- Linking and Embedding for Databases), by
Microsoft, and JDBC (Java Database Connection). A middle-tier which
consists of an OLAP server for fast querying of the data warehouse.
The OLAP server is implemented using either
(1) A Relational OLAP (ROLAP) model, i.e., an extended relational
DBMS that maps functions on multidimensional data to standard
relational operations.

DATAWAREHOUSE WITH MINING TECHNIQUES 14

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

(2) A Multidimensional OLAP (MOLAP) model, i.e., a particular purpose


server that directly implements multidimensional information and
operations.
A top-tier that contains front-end tools for displaying results provided by
OLAP, as well as additional tools for data mining of the OLAP-generated data.
The overall Data Warehouse Architecture is shown in fig:

DATAWAREHOUSE WITH MINING TECHNIQUES 15

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

DATA WAREHOUSE MODELS:


From the architecture point of view, there are three data warehouse models:
the enterprise warehouse, the data mart, and the virtual warehouse.
Enterprise Warehouse
An Enterprise warehouse collects all of the records about subjects spanning
the entire organization. It supports corporate-wide data integration, usually
from one or more operational systems or external data providers, and it's
cross-functional in scope. It generally contains detailed information as well
as summarized information and can range in estimate from a few gigabytes to
hundreds of gigabytes, terabytes, or beyond. An enterprise data warehouse
may be accomplished on traditional mainframes, UNIX super servers, or
parallel architecture platforms. It required extensive business modelling and
may take years to develop and build.
Data Mart
A data mart includes a subset of corporate-wide data that is of value to a
specific collection of users. The scope is confined to particular selected
subjects. For example, a marketing data mart may restrict its subjects to the
customer, items, and sales. The data contained in the data marts tend to be
summarized.

Data Marts is divided into two parts:


1) Independent Data Mart:
Independent data mart is sourced from data captured from one or more
operational systems or external data providers, or data generally
locally within a different department or geographic area.

DATAWAREHOUSE WITH MINING TECHNIQUES 16

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

2) Dependent Data Mart:


Dependent data marts are sourced exactly from enterprise data-
warehouses.

Virtual Warehouses
Virtual Data Warehouses is a set of perception over the operational database.
For effective query processing, only some of the possible summary vision
may be materialized. A virtual warehouse is simple to build but required
excess capacity on operational database servers.

DIFFERENCE BETWEEN DATA WAREHOUSE AND DATA


MARTS:
SR.N DATA WAREHOUSE DATA MARTS:
O
1) Data warehouse is a While it is a decentralised system.
Centralised system.
2) In data warehouse, lightly While in Data mart, highly
denormalization takes place. denormalization takes place.

3) Data warehouse is top-down While it is a bottom-up model.


model.
4) To build a warehouse is While to build a mart is easy.
difficult.
5) Data warehouse is top-down While in this, Star schema and
model. snowflake schema are used.
6) To build a warehouse is While it is not flexible.
difficult.
7) Data Warehouse is the data- While it is the project-oriented in
oriented in nature. nature.
8) Data Ware house has long While data-mart has short life than
life. warehouse.
9) In Data Warehouse, Data are While in this, data are contained in
contained in detail form. summarized form.

DATAWAREHOUSE WITH MINING TECHNIQUES 17

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

10) Data Warehouse is vast in While data mart is smaller than


size. warehouse.
11) It collects data from various It generally stores data from a data
data sources. warehouse.
12) Long time for processing the Less time for processing the data
data because of large data. because of handling only a small
amount of data.
13) Complicated design process Easy design process of creating
of creating schemas and schemas and views.
views

WHAT IS ETL (EXTRACTION, TRANSFORMATION&LOADING):

ETL stands for Extract, Transform, Load and it is a process used in data
warehousing to extract data from various sources, transform it into a format
suitable for loading into a data warehouse, and then load it into the
warehouse. ETL process can also use the pipelining concept i.e., as soon as
some data is extracted, it can transform and during that period some new
data can be extracted. And while the transformed data is being loaded into
the data warehouse, the already extracted data can be transformed.
The process of ETL can be broken down into the following three stages:

1. Extraction:

DATAWAREHOUSE WITH MINING TECHNIQUES 18

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

The first step of the ETL process is extraction. In this step, data from
various source systems is extracted which can be in various formats like
relational databases, No SQL, XML, and flat files into the staging area. It is
important to extract the data from various source systems and store it into
the staging area first and not directly into the data warehouse because the
extracted data is in various formats and can be corrupted also. Hence
loading it directly into the data warehouse may damage it and rollback will
be much more difficult. Therefore, this is one of the most important steps of
ETL process.
2. Transformation:
The second step of the ETL process is transformation. In this step, a set of
rules or functions are applied on the extracted data to convert it into a single
standard format. It may involve following processes/tasks:
 Filtering – loading only certain attributes into the data warehouse.
 Cleaning – filling up the NULL values with some default values, mapping
U.S.A, United States, and America into USA, etc.
 Joining – joining multiple attributes into one.
 Splitting – splitting a single attribute into multiple attributes.
 Sorting – sorting tuples on the basis of some attribute (generally key-
attribute).
3. Loading:
The third and final step of the ETL process is loading. In this step, the
transformed data is finally loaded into the data warehouse. Sometimes the
data is updated by loading into the data warehouse very frequently and
sometimes it is done after longer but regular intervals. The rate and period
of loading solely depends on the requirements and varies from system to
system

DATAWAREHOUSE WITH MINING TECHNIQUES 19

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

META DATA REPOSITORY:


The metadata repository stores information that defines DW objects. It includes
the following parameters and information for the middle and the top- tier
applications:
1. A description of the DW structure, including the warehouse schema,
dimension, hierarchies, data mart locations, and contents, etc.
2. Operational metadata, which usually describes the currency level of the
stored data, i.e., active, archived or purged, and warehouse monitoring
information, i.e., usage statistics, error reports, audit, etc.
3. System performance data, which includes indices, used to improve data
access and retrieval performance.
4. Information about the mapping from operational databases, which provides
source RDBMSs and their contents, cleaning and transformation rules, etc.
5. Summarization algorithms, predefined queries, and reports business data,
which include business terms and definitions, ownership information, etc.

BENEFITS/ADVANTAGES OF DATA WAREHOUSE:


Benefits from a successful implementation of a data warehouse include:
1. Enhanced Business Intelligence
Insights will be gained through improved information access. Managers and
executives will be freed from making their decisions based on limited data.
Decisions that affect the strategy and operations of organizations will be
based upon credible facts and will be backed up with evidence and actual
organizational data.
2. Increased Query and System Performance
The data warehouse is built for analysis and retrieval of data rather than
efficient upkeep of invidual records (i.e. transactions). Further, the data
warehouse allows for a large system burden to be taken off the operational
environment and effectively distributes system load across an entire
organization’s technology infrastructure.

DATAWAREHOUSE WITH MINING TECHNIQUES 20

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

3. Business Intelligence from Multiple Sources


For many organizations, enterprise information systems are comprised of
multiple subsystems, physically separated and built on different platforms.
Moreover, merging data from multiple disparate data sources is a common
need when conducting business intelligence. To solve this problem, the data
warehouse performs integration of existing disparate data sources and makes
them accessible in one place.
4. Timely Access to Data
The data warehouse enables business users and decision makers to have
access to data from many different sources as they need to have access to the
data. Additionally, business users will spend little time in the data retrieval
process. Scheduled data integration routines, known as ETL, are leveraged
within a data warehouse environment.
5. Enhanced Data Quality and Consistency
A data warehouse implementation typically includes the conversion of data
from numerous source systems and data files and transformation of the
disparate data into a common format. Data from the various business units
and departments is standardized and the inconsistent nature of data from the
unique source systems is removed. Moreover, individual business units and
departments including sales, marketing, finance, and operations, will start to
utilize the same data repository as the source system for their individual
queries and reports. Thus, each of these individual business units and
departments will produce results that are consistent with the other business
units within the organization.
6. Historical Intelligence
Data warehouses generally contain many years’ worth of data that can
neither be stored within nor reported from a transactional system. Typically,
transactional systems satisfy most operating reporting requirements for a
given time-period but without the inclusion of historical data. In contrast, the
data warehouse stores large amounts of historical data and can enable
advanced business intelligence including time-period analysis, trend
analysis, and trend prediction. The advantage of the data warehouse is that it
allows for advanced reporting and analysis of multiple time-periods.

DATAWAREHOUSE WITH MINING TECHNIQUES 21

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

7. High Return on Investment


Return on investment (ROI) refers to the amount of increased revenue or
decreased expenses a business will be able to realize from any project or
investment of capital. Subsequently, implementations of data warehouses
and complementary business intelligence systems have enabled business to
generate higher amounts of revenue and provide substantial cost savings.
8. Data warehouse house permits business users to quickly access significant
data from a few sources all in one place
9. Data warehouse gives consistent data on various cross-functional actions
10. It assists to put together many sources of data to reduce time for analysis &
reporting
11. Data warehouse gives to reduce total rotate time for analysis &reporting
12. For reporting & analysis of data need to use restructuring & integration which
make it easier
13. To save user's time of retrieving data from multiple sources it allows users to
access critical data from the number of sources in a single place
14. Data warehouse accumulates a large amount of data (HISTORICAL DATA)
which helps users to analyze different time periods & development to make
future prophecies.
15. Understand business trends and make better forecasting decisions.
16. Data Warehouses are designed to perform well enormous amounts of data.
17. The structure of data warehouses is more accessible for end-users to navigate,
understand, and query.
18. Queries that would be complex in many normalized databases could be easier
to build and maintain in data warehouses.
19. Data warehousing is an efficient method to manage demand for lots of
information from lots of users.

DATAWAREHOUSE WITH MINING TECHNIQUES 22

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

DISADVANTAGES OF DATA WAREHOUSE:


1. For unstructured data it is not an ideal option.
2. Data warehouses creation & implementation is surely time confusing matter
3. Data warehouse can be out of date relatively & rapidly
4. It is difficult to make changes in Data type & ranges Data source schema
indexes & queries from data base
5. The data warehouse may appear easy but it is too difficult for the users.
6. Despite best efforts at the project management data warehousing project
possibility will always increase
7. In data warehouse sometime users will widen diverse business rules
8. Business needs to spend lots of their resources for training& implementation
purpose to accumulate the large amount of data.

APPLICATION OF DATA WAREHOUSE:


1) Social Media Websites or E-Commerce Sites:
E-commerce platforms need to gather key marketing metrics (such as clicks,
impressions, website visitors, etc.) from marketing tools and use that to
approach their customers in a better way. This is where data warehouses help.
Replicating data, tracking & visualizing KPIs such as conversion rates, churn
rates, and return on ad spends, safe storage, etc. help companies perform better.
In recent times, amazon redshift is the most popular warehouse being used for
marketing analytics, because of its user-friendly UI and flexibility.
2) Government and Education:
The federal government uses the warehouses for compliance research, while the
state government uses them for human resources services such as recruitment
and accounting services such as payroll management.
The government uses data warehouses to store and analyse tax records, health
policy records, and providers. Their entire criminal law database is also
connected to the state's data warehouse. Illegal activity is predicted from the
patterns, trends, and results of analysing historical data associated with past
criminals.
DATAWAREHOUSE WITH MINING TECHNIQUES 23

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

Universities employ data warehouses to collect information for grant proposals,


student demographic analysis, and human resource management. Most colleges'
financial departments, including the Financial Aid department, rely on data
warehouses.
3) Banking:
Bankers can better manage all of their available resources with the right Data
Warehousing solution. They can better analyse consumer data, government
regulations, and market trends to facilitate better decision-making.
4) Finance:
Similar to the applications seen in banking, they mainly revolve around
evaluation and trends of customer expenses which aid in maximizing the profits
earned by their clients.
5) Transportation:
Client data is recorded in data warehouses in the transportation industry,
allowing traders to experiment with target marketing, where marketing
campaigns are created with the needs of the customer in mind.
They are used in the industry's internal environment to analyse client feedback
and performance, manage crews on board, and analyse customer financial
reports for pricing strategies.
6) Consumer Goods Industry:
They are used to predict consumer trends, inventory management, and market
and advertising research. An in-depth analysis of sales and production is also
carried out. Apart from these, information is exchanged between business
partners and clientele.
7) Retail Services:
Retailers act as go-betweens for producers and customers. In order to ensure
their continuous presence on the market, they must keep records of both parties.
They employ warehouses to keep track of merchandise, advertising promotions,
and consumer purchasing patterns. They also analyse sales to determine fast-
selling and slow-selling product lines and determine their shelf space through
elimination.

DATAWAREHOUSE WITH MINING TECHNIQUES 24

Dhrupesh Sir 9699692059


UNIT 1: INTRODUCTION TO DATA WAREHOUSING

QUESTIONS:
1) Define Data Warehouse (2 definitions)
2) Write the Characteristics of Data Warehousing
3) Difference Between operational database and data warehouse (6 points)
4) Write the needs for data warehouse
5) Explain ETL with diagram
6) Write limitations, benefits and application of data warehouse
7) Draw and explain single tier, 2-tier, and 3-tier architecture of data
warehouse
8) Difference between data warehouse and data mart
9) Difference between ETL and ELT.
10) Explain Meta data repository
11) Explain Data Warehouse Models

DATAWAREHOUSE WITH MINING TECHNIQUES 25

Dhrupesh Sir 9699692059

You might also like