0% found this document useful (0 votes)

740 views11 pages

Data Warehousing & Data Mining-A View

The document discusses data warehousing and data mining. It defines a data warehouse as a repository of integrated information from different sources that is organized for analysis. It also defines data mining as revealing patterns in historical data through techniques like statistics and neural networks. Data mining systems improve organizations by increasing the usefulness of their knowledge.

Uploaded by

api-19799369

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

740 views11 pages

Data Warehousing & Data Mining-A View

Uploaded by

api-19799369

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 11

A data warehouse is a repository of integrated information, available for queries

and analysis. Data and information are extracted from non-homogeneous sources as
they are generated and processed using process managers (load/warehouse/query).This
makes it much easier and more efficient to run queries over data that originally came
from different sources. It also enables the people to take informed decisions.
Data mining draws from the data warehouse, revealing patterns of information
in historical data, in terms of customer data or any other data in ways that we never
thought possible. It combines techniques like statistical analysis, data visualization,
induction and neutral networks. Data mining systems improve an organization’s
effectiveness, efficiency and value by increasing the usefulness of the knowledge the
organization possesses.

Data warehousing & Data Mining- A View:

Data Warehousing – Introduction:
We know today’s world is fully competitive. All companies, which are running
over the years, are trying to increase their profits and also optimizing their qualities
and activities. So, every organization having a lot of information and growing
incrementally needs modifications. But our traditional operational systems are never
designed to support the kind of business activities. Even the latest technology also
cannot be optimized to support cost-effective operational and multidimensional
requirements. To meet these needs there is a new breed system, known as a DATA
WAREHOUSE.

Data warehousing- Definition:

Data warehouses are built to support large cost-effective data volumes (above
100GB of database) which can be a relational database, multidimensional database,
flat file, hierarchical database, object database, etc.

Here, the more difficult challenge is to architect systems that automate

requirements, going on a continuing basis. If we take business area, as it grows and
changes, it needs will change. Data Warehouses are designed to ride with these
changes, always building a degree of flexibility with in the system. The real problem
is that the business itself may not be aware of its requirements in the future so, in
designing a data warehouse we use the current, today’s requirements and can guess for
the future.
Operational systems Vs Data warehouse systems:
Operational systems, which enable day-to-day functions, feed the data
warehouse but both differ in many aspects. Operating systems are process-oriented
which support transaction processing. They depend on current data and update the data
regularly. They do fast insert and updates of small data. On the other hand, Data
warehouse systems are subject-oriented which support analytical processing. They
depend on historical data and almost data is read only. They does fast retrievals of
large data.
Data-warehouse –Decision making:
A decision support system or tool is one specifically design to allow business
end users to perform computer generated analysis of their own. Many data warehouses
are not used as decision support systems or tools do not necessarily require the use of
data warehouse as a source for data, the most used decision support tools are
spreadsheets not connected in any automated way which a data ware house.

Data warehouse- goals:

The fundamental to enable user’s appropriate access to a homogenized and
comprehensive view of organization. It also supports forecasting planning and
decision making processes. In additional goal is to achieve information consistency
provide security and adaptability.
Data warehouse-process flow:
The process flow is represented as follows:

 Extract and load the data: Data extraction involves extracting the data from source
systems and makes it available to the data warehouse where as data load takes
extracted data and loads it into the data warehouse.
 Clean and transform data: It performs the consistency checks on the loaded data,
and then structures it for query performance and for minimizing the operational
costs.
Data warehousing-“Errors”: The possible errors encountered in a data warehouses
are:
 Incomplete errors: missing records, etc.
 Incorrect errors: wrong (but sometimes right) codes, wrong calculations, etc.
 Incomprehensibility errors: Unknown codes, spreadsheets and word processing
files, etc.
 Inconsistency errors: Inconsistent use of different codes, over lapping codes,
etc.
 Back up and archive data: The data is being backed up regularly and also older
data is removed from the system in a format that allows it to be quickly restored if
required.
 Query management: It manages the queries and speeds them up by directing
queries to the most effective data source and also monitor the actual query profiles.
Data warehouse architecture:
Data warehouse architecture (DWA) is a way of representing the over all
structure of data, communication, processing and presentation that exists for end-user
Computing with in the enterprise. The architecture is made up of a number of inter-
connected parts:
• Operational database/External database layer: Operational systems process
data to support critical operational needs. To do that, operational databases have
been historically created to provide an efficient processing structure for a relatively
small number of well-defined business transactions.
• Information Access Layer: This is the layer that the end-user deals with directly.
In particular, it represents the tools that the end-user normally uses day to day.
e.g.: Excel, Lotus 1-2-3, etc.
• Data Access Layer: The Data Access Layer of the Data Warehouse Architecture is
involved with allowing the information Layer to talk to the Operational Layer.
• Data Directory (Meta-data) Layer: Meta-data is the data about data with in the
enterprise. Record description in a COBOL program is Meta-data.

Data Warehouse Architecture

 Process Management Layer: The Process Management Layer is involved in

scheduling the various tasks that must be accomplished to build and maintain the
data warehouse and data directory.
 Application Messaging Layer: The Application Message Layer has to do with
transporting information around the enterprise-computing network. Application
Message is also referred to as “Middleware”, but it can involve those just
networking protocols.
 Data Warehouse (physical) layer: The (core) Data Warehouse is where the actual
data used primarily for informational uses occur. In a Physical Data Warehouse,
copies, in some cases many copies, of operational and or external data are actually
stored in form that is easy to access.

 Data Stating Layer: Data staging is also called copy management or replication
management, but in fact, it includes all of the processes necessary to select, edit,
summarize, combine and load data warehouse and information access data from
operational and/or external databases.

Data Warehouse-Data Redundancy:

A data warehousing strategy may ultimately include all the three.
 “virtual” or “point-to-point” Data Warehouses
 Central Data Warehouses
 Distributed Data Warehouse

Data Warehouse-applications: Role of data warehouse in various application areas.

1. Marketing solutions: Marketing database, customer loyalty
scheme & profiling, etc.
2. Retail: sales analysis, shrinkage analysis, promotion analysis,
space planning.
3. Insurance: Product profitability analysis, orphan analysis.
4. Telephone companies: individual terrifying through call
analysis, network analysis.
5. Retail banking: customer profitability analysis, customer
scoring/loan decision.

Data Warehouse-Future developments:

Data warehousing is such a new field that it is difficult to estimate what new
developments are likely to most affect it. Clearly, the development of parallel DB
servers with improved query engines is likely to be one of the most important. Another
new technology is data warehouses that allow for the mixing of traditional numbers,
text and multi-media. The availability of improved tools for data visualization
(business intelligence) will allow users to see things that could never be seen before.

Data Mining- introduction:

Data mining techniques are the result of a long process of research and product
development. This evolution began when business data was first stored on computers,
continued with improvements in data access, and more recently, generated
technologies that allow users to navigate through their data in real time. Data mining
takes this evolutionary process beyond retrospective data access and navigation to
prospective and proactive information delivery.

Data mining – definition:

Data mining, “the extraction of hidden predictive information from large
databases”, is a powerful new technology with great potential to help companies
focus on the most important information in their data warehouses. Data mining tools
predict future trends and behaviors, allowing business to make proactive, knowledge-
driven decisions. The automated, prospective analyses offered by data mining move
beyond the analyses of past events provided by retrospective tools typical of decision
support systems.

Data mining – Supporting Technologies:

Data mining can be applied in the business field, because it is supported by
three technologies.
Massive data collection: Databases are growing at unprecedented rates and can be
larger than expected.
Powerful multiprocessor computers: The need for improved computational engines
can now be met.
Data mining algorithms: They have been implemented as mature, reliable,
understandable tools.

Data Mining:
Scope: Given databases of sufficient size and quality, data mining technology can
generate new business opportunities by providing these capabilities:
Automated prediction of trends and behaviors: Example: predictive problem is
targeted marketing.
Automated discovery of previously unknown patters: Data mining tools sweep
through databases and identify previously hidden patterns in one step. An example is
the analysis of retail sales.

Data Mining-Algorithms: Some of the most common data mining algorithms in use
today are two sections based on when the technique was developed and when it
became ready to be used.
1. Classical Techniques: Statistics, neighborhoods and clustering that have been used
for decades.
Statistics: These are data driven and are used to discover patterns and build predictive
models.
(a)Histograms: One of the best ways to summarize data is to provide a histogram of
the data.
Ex 1: Counting the numbers of occurrences of different colors of eyes in our database.
Ex 2: Representing the majority of customers that are over the age of 50.
Figure – depicts a simple predictor (eye color).

Figure – depicts customers of different ages.

(b)Linear regression: In statistics, prediction is usually synonymous with
regression of some form. The simplest form of regression is simple linear regression
that just contains one predictors and a prediction. The relationship between the two
can be mapped on a two dimensional space.
The predictive model is the line

Nearest Neighbor: Objects that are “near” to each other will have similar prediction
values as well. Thus if you know the prediction value of one of the objects you can
predict it for its nearest neighbors. One of the improvements that are usually made to
the basic nearest neighbor algorithm is to take a vote from the “k” nearest neighbors.
Ex: The nearest neighbors are shown are shown graphically for three unclassified
records: A, B, and C.
Clustering: It is the method by which like records are grouped together. Usually this
is done to give the end user a high level view of what is going on in the database.
There are mainly two types.
Hierarchical and Non-Hierarchical Clustering: The hierarchy of clusters is
usually viewed as a tree where the smallest clusters merge together to create the next
highest level of clusters and so on.

Hierararchy of clusters elongated clusters nested clusters

2. Next Generations Techniques: They represent techniques such as Trees, Networks

and Rules that have only been widely used since the early 1980’s.
• Neural Networks: Neural networks are very powerful predictive modeling
techniques but some of the power comes at the expense of ease of use and ease of
deployment. Because of the complexity of these techniques much effort has been
expanded in trying to increase its clarity.
100 Outputs nodes
Links
Age
47

5hdden nodes nodes default?

NO
Income
$65,000

100 Input nodes

Data compression and feature extraction. Prediction of loan default.

• Decision Trees: Tree-shaped structures that represent sets of decisions. These

decisions generate rules for the classification of a dataset. Specific decision tree
methods tree methods include Classification and Regression (CART) and Chi Square
Automatic Interaction Detection (CHAID).

A decision tree

• Rule Induction: The extraction of useful if-then rules from data based on
statistical significance.
These capabilities are now evolving to integrate directly with industry-
standard data warehouse and OLAP platforms.

Data Mining - Working Procedure:

The technique that is used to perform the tasks of predicting and handling
over the important things to the people, in data mining is called modeling. Modeling is
simply the act of building a model in one situation where you know the answer and
then applying it to another situation that you don’t. This act of model building is thus
something that people have been doing for along time, certainly before the advent of
computers or data mining technology. Once the model is built it can then be used in
similar situations where you don’t know the answer.
The following table illustrates a model for new customer prospecting in data
Warehouse.

yesterday today tomorrow

Static information and current plans Known Known Known
(e.g. demographic data)
Dynamic information Known Known Target
(e.g. customer transactions)

Data mining for predictions

Once the mining is complete, the results can be tested against the data held in
the vault to confirm the model’s validity. If the model works, its observation should
hold for the vaulted data.

Data Mining – Architecture:

To best apply these advanced techniques, they must be fully integrated with
data Warehouse as well as flexible interactive business analysis tools. Many data
mining tools currently operate outside of the Warehouse, requiring extra steps for
extracting, importing and analyzing the data. Furthermore, when new insights require
operational implementation, integration with the warehouse simplifies the application
of results from data mining. The resulting analytical data warehouse can be applied to
improve business processes throughout the organization. The following figure
illustrates architecture for advanced analysis in a large data warehouse.

The ideal starting point is a data warehouse containing a combination of internal

data tracking all customer contact coupled with external market data about competitor
activity. Background information on potential customers also provides an excellent
basis for prospecting. This warehouse can be implemented in a variety of relational
database systems: Sybase, Oracle, Redbrick and so on.

An OLAP (On-Line Analytical Processing) server enables amore sophisticated

end-user business model to be applied when navigating the data ware house. The
multidimensional structures allow the user to analyze the data as they want to view
their business. The Data Mining Server must be integrated with the data warehouse
and the OLAP server to embed ROI-focused business analysis directly into this
infrastructure. As the warehouse grows with new decisions and results, the
organization can continually mine the best practices and apply them to future
decisions.
This design represents a fundamental shift from conventional decision support
systems. Rather than simply delivering data to the end-user through query and
reporting software, the Advanced Analysis Server applies user’s business models
directly to the warehouse and returns a proactive analysis of the most relevant
information. These results enhance the metadata in the OLAP Server by providing a
dynamic metadata layer that represents a distilled view of the data. Reporting,
visualization, and other analysis tools can then be applied to plan future actions and
confirm the impact of those plans.

Data mining – Applications:

Some successful application areas:
1. A pharmaceutical company can analyze its recent sales and can determine
which marketing activities will have the greatest impact in future.
2. A credit card company using a small test mailing can identify the customer
attributes.
3. A diversified transportation comp-any can apply data mining to identify the best
prospects.
4. A large consumer package goods company can apply data mining to improve its
sales process to retailers.

Conclusions- Data Warehousing & Data Mining:

All large organizations already have data warehouses, but they are just not
managing them. In order to get most out of this period, the data warehouse planners
and developers must have a clear idea of what they are looking for and then choose
strategies and methods that will improve the performance and flexibility.
There is a growing gap between more powerful storage and retrieval systems
and the users’ ability of effectively analyzing them. As seen, both relational and
OLAP technologies are used for navigating massive data warehouses. Quantifiable
business benefits have been proven through the integration of data mining with current
information systems, and new products are on the horizon that will bring this
integration to an even wider audience of users.

Chapter 12 - Data Warehousing and Online Analytical Processing
No ratings yet
Chapter 12 - Data Warehousing and Online Analytical Processing
20 pages
Data Warehouse-Basic Concepts
No ratings yet
Data Warehouse-Basic Concepts
21 pages
Advanced ETL User Manual
No ratings yet
Advanced ETL User Manual
194 pages
DWH by Concepts - v1
No ratings yet
DWH by Concepts - v1
56 pages
Project Workbook Information: Suggested Sheets For All Projects
No ratings yet
Project Workbook Information: Suggested Sheets For All Projects
23 pages
Database Normalization and Design Techniques: Zero Form
100% (1)
Database Normalization and Design Techniques: Zero Form
14 pages
Import Project Report
No ratings yet
Import Project Report
150 pages
Data Warehouse Massively Parallel Processing Design Patterns
100% (1)
Data Warehouse Massively Parallel Processing Design Patterns
28 pages
Adbms Data Warehousing and Data Mining
No ratings yet
Adbms Data Warehousing and Data Mining
169 pages
Chapter 5 Data Warehouse Design Methodology
No ratings yet
Chapter 5 Data Warehouse Design Methodology
10 pages
DWBI Testing by Puneet
No ratings yet
DWBI Testing by Puneet
27 pages
CHAPTER 2 - Group 6 - Data Warehouse - The Building Blocks
No ratings yet
CHAPTER 2 - Group 6 - Data Warehouse - The Building Blocks
69 pages
Virtual Reality in Psychiatry and Psychology: Malineni Lakshmaih Engineering College
No ratings yet
Virtual Reality in Psychiatry and Psychology: Malineni Lakshmaih Engineering College
11 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
11 pages
Adattarhaz Forum 2013 Areus Halasz Gabor Adatintegracio Nagyvallalatok Szamara
No ratings yet
Adattarhaz Forum 2013 Areus Halasz Gabor Adatintegracio Nagyvallalatok Szamara
30 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
ETL Process: - 4 Major Components
No ratings yet
ETL Process: - 4 Major Components
27 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
52 pages
Data Warehouse Matrix
No ratings yet
Data Warehouse Matrix
7 pages
A Guide To Best Practices: Putting The Data Lake To Work
No ratings yet
A Guide To Best Practices: Putting The Data Lake To Work
12 pages
Benefits of Data Archiving in Data Warehouses
100% (1)
Benefits of Data Archiving in Data Warehouses
12 pages
Unit 2 - Data Warehouse Logical Designm
No ratings yet
Unit 2 - Data Warehouse Logical Designm
73 pages
Introduction To Data Warehousing: Presentation On
No ratings yet
Introduction To Data Warehousing: Presentation On
8 pages
12 01 09 10 32 12 1287 Sindhujam PDF
No ratings yet
12 01 09 10 32 12 1287 Sindhujam PDF
23 pages
20200727005509D5181 - ISYS6535 - Session 15-16 - Use Case Modeling Part 1 (Overview, Use Case Descriptions, Activity Diagram For Use Cases)
No ratings yet
20200727005509D5181 - ISYS6535 - Session 15-16 - Use Case Modeling Part 1 (Overview, Use Case Descriptions, Activity Diagram For Use Cases)
20 pages
Selected Papers by CSE Department
No ratings yet
Selected Papers by CSE Department
216 pages
Unit No: 01 Introduction To Data Warehouse: by Pratiksha Meshram
No ratings yet
Unit No: 01 Introduction To Data Warehouse: by Pratiksha Meshram
38 pages
Warehousing
No ratings yet
Warehousing
15 pages
Data Warehouse Architechture-Layers
No ratings yet
Data Warehouse Architechture-Layers
21 pages
Viruses and Anti Virus Techniques2
100% (1)
Viruses and Anti Virus Techniques2
13 pages
ETL Introduction
No ratings yet
ETL Introduction
44 pages
Data Warehousing Dr. L. Rajya Lakshmi
No ratings yet
Data Warehousing Dr. L. Rajya Lakshmi
16 pages
Why Is The Snowflake Schema A Good Data Warehouse Design
No ratings yet
Why Is The Snowflake Schema A Good Data Warehouse Design
19 pages
Data Warehouse and Design Presentation
No ratings yet
Data Warehouse and Design Presentation
11 pages
How To Sell A Data Warehouse To Upper Management Checklist
No ratings yet
How To Sell A Data Warehouse To Upper Management Checklist
6 pages
Windows Trouble
No ratings yet
Windows Trouble
48 pages
Overview of Data Warehousing: AIM: - To Learn Architectural Framework For Data Warehousing Theory
No ratings yet
Overview of Data Warehousing: AIM: - To Learn Architectural Framework For Data Warehousing Theory
10 pages
Warehouse Complete
No ratings yet
Warehouse Complete
6 pages
Dwques
No ratings yet
Dwques
5 pages
Data Mining Unit - 1 Notes
No ratings yet
Data Mining Unit - 1 Notes
16 pages
Data Warehousing
No ratings yet
Data Warehousing
20 pages
Ooad Through Uml
No ratings yet
Ooad Through Uml
18 pages
Data Warehousing Concepts JSR
No ratings yet
Data Warehousing Concepts JSR
24 pages
A Novel Switch Mode Dc-To-Ac Converter With Nonlinear Robust Control
No ratings yet
A Novel Switch Mode Dc-To-Ac Converter With Nonlinear Robust Control
21 pages
Informatica CDC
No ratings yet
Informatica CDC
4 pages
Windows Open Services Architecture (WOSA)
No ratings yet
Windows Open Services Architecture (WOSA)
13 pages
Clover ETL - 1
No ratings yet
Clover ETL - 1
29 pages
Grid Computing: Qis College of Engineering&Technology ONGOLE-523272
No ratings yet
Grid Computing: Qis College of Engineering&Technology ONGOLE-523272
14 pages
03 Etl 081028 2055
No ratings yet
03 Etl 081028 2055
46 pages
Virtual Reality IN Psychiatry AND Psychology: V.V.N.Nikhil
No ratings yet
Virtual Reality IN Psychiatry AND Psychology: V.V.N.Nikhil
16 pages
A Comprehensive Approach To Data Warehouse Testing
No ratings yet
A Comprehensive Approach To Data Warehouse Testing
8 pages
Presentation by
No ratings yet
Presentation by
15 pages
Trends in Mobile Computing: Abstract
No ratings yet
Trends in Mobile Computing: Abstract
14 pages
Network Security & Cryptography: Paper ON
No ratings yet
Network Security & Cryptography: Paper ON
10 pages
Optimal Power Flow With Injection Model UPFC Incorporated Power System
No ratings yet
Optimal Power Flow With Injection Model UPFC Incorporated Power System
8 pages
Haptic Technology: Surgical Simulation and Medical Training
No ratings yet
Haptic Technology: Surgical Simulation and Medical Training
12 pages
A Technical Paper ON Genomic Digital Signal Processing: Jyothishmathi Institute of Technology and Science Karimnagar
No ratings yet
A Technical Paper ON Genomic Digital Signal Processing: Jyothishmathi Institute of Technology and Science Karimnagar
12 pages
"Mobile Computing": Malineni Lakshmaiah Engineering College
No ratings yet
"Mobile Computing": Malineni Lakshmaiah Engineering College
12 pages
Security On The Internet and Firewalls
No ratings yet
Security On The Internet and Firewalls
14 pages
Prepared BY 1.saiteja. 2.rajesh. 3.ramakrishna
No ratings yet
Prepared BY 1.saiteja. 2.rajesh. 3.ramakrishna
11 pages
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
No ratings yet
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
5 pages
Etl
No ratings yet
Etl
13 pages
An Overview On Data Quality Issues at Data Staging ETL
No ratings yet
An Overview On Data Quality Issues at Data Staging ETL
4 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
Network Security & Cryptography: LITAM QUEST - 2007
No ratings yet
Network Security & Cryptography: LITAM QUEST - 2007
13 pages
Cryptography: Presented by
No ratings yet
Cryptography: Presented by
11 pages
Wireless Communication: Presented by
No ratings yet
Wireless Communication: Presented by
9 pages
Bluetooth: Author: Asharani P. Mirazkar Vijaylaxmi A. Horakeri
No ratings yet
Bluetooth: Author: Asharani P. Mirazkar Vijaylaxmi A. Horakeri
7 pages
Geo Thermal Energy: Vignan'S Engineering College (Vadlamudi)
No ratings yet
Geo Thermal Energy: Vignan'S Engineering College (Vadlamudi)
12 pages
Data Warehousing FAQ
No ratings yet
Data Warehousing FAQ
5 pages
Paper Presentation: Pattern Recognition
No ratings yet
Paper Presentation: Pattern Recognition
11 pages
CERTIFICATES
No ratings yet
CERTIFICATES
3 pages
Chapter-21The Virtual Data Warehouse
No ratings yet
Chapter-21The Virtual Data Warehouse
11 pages
Lessons Learnt Best Practices
No ratings yet
Lessons Learnt Best Practices
7 pages
Submitted by P.Ramani: Jatropha Oil
No ratings yet
Submitted by P.Ramani: Jatropha Oil
11 pages
Factless Fact Table
No ratings yet
Factless Fact Table
5 pages
Very - Large-Scale Integration (VLSI) Is The Process of Creating Integrated Circuits by
No ratings yet
Very - Large-Scale Integration (VLSI) Is The Process of Creating Integrated Circuits by
11 pages
Technet Etl Design Questionnaire
100% (1)
Technet Etl Design Questionnaire
15 pages
Applications of Virtual Reality
No ratings yet
Applications of Virtual Reality
13 pages
Data Mining and Data Warehouse: Raju - Qis@yahoo - Co.in Praneeth - Grp@yahoo - Co.in
No ratings yet
Data Mining and Data Warehouse: Raju - Qis@yahoo - Co.in Praneeth - Grp@yahoo - Co.in
8 pages
ETL Staging Area
No ratings yet
ETL Staging Area
3 pages
DWH Architecture
No ratings yet
DWH Architecture
3 pages
Data Warehouse Testing - Approaches and Standards
No ratings yet
Data Warehouse Testing - Approaches and Standards
8 pages
Data Warehouse Final Report
No ratings yet
Data Warehouse Final Report
19 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
CDCSetup
No ratings yet
CDCSetup
4 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages