0% found this document useful (0 votes)
30 views

1 What Is Data Mining

The document discusses the evolution of data warehousing from the 1960s to present. It outlines key events like dimensional modeling in the 1960s and the introduction of the term 'data warehouse' in 1990. The document then covers the goals, needs, and features of data warehousing as well as related topics like data mining.

Uploaded by

Sonu Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

1 What Is Data Mining

The document discusses the evolution of data warehousing from the 1960s to present. It outlines key events like dimensional modeling in the 1960s and the introduction of the term 'data warehouse' in 1990. The document then covers the goals, needs, and features of data warehousing as well as related topics like data mining.

Uploaded by

Sonu Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Some key events in evolution of Data Warehouse-

 1960- Dartmouth and General Mills in a joint research project, develop the
terms dimensions and facts.
 1970- A Nielsen and IRI introduces dimensional data marts for retail
sales.
 1983- Tera Data Corporation introduces a database management system
which is specifically designed for decision support
 # The idea of data warehousing came to the late 1980's when IBM
researchers Barry Devlin and Paul Murphy established the "Business
Data Warehouse.".
 However, the real concept was given by Inmon Bill. He was considered as
a father of data warehouse. He had written about a variety of topics for
building, usage, and maintenance of the warehouse & the Corporate
Information Factory.

# The term "Data Warehouse" was first coined by Bill Inmon in 1990.
According to Inmon, a data warehouse is a subject oriented, integrated, time-
variant, and non-volatile collection of data. This data helps analysts to take
informed decisions in an organization.

In short, the data warehousing idea was planned to support an architectural model
for the flow of information from the operational system to decisional support
environments. The concept attempt to address the various problems associated
with the flow, mainly the high costs associated with it.

In the absence of data warehousing architecture, a vast amount of space was


required to support multiple decision support environments. In large
corporations, it was ordinary for various decision support environments to operate
independently.
Goals of Data Warehousing
o To help reporting as well as analysis
o Maintain the organization's historical information
o Be the foundation for decision making.

Data Warehouse is needed for the following reasons:

1. Business User: Business users require a data warehouse to view summarized


data from the past. Since these people are non-technical, the data may be
presented to them in an elementary form.
2. Store historical data: Data Warehouse is required to store the time variable
data from the past. This input is made to be used for various purposes.
3. Make strategic decisions: Some strategies may be depending upon the data
in the data warehouse. So, data warehouse contributes to making strategic
decisions.
4. For data consistency and quality: Bringing the data from different sources at
a commonplace, the user can effectively undertake to bring the uniformity and
consistency in data.
5. High response time: Data warehouse has to be ready for somewhat
unexpected loads and types of queries, which demands a significant degree of
flexibility and quick response time.

Data warehousing is a method of organizing and compiling data into one


database, whereas data mining deals with fetching important data from databases.
Data mining attempts to depict meaningful patterns through a dependency on the
data that is compiled in the data warehouse.
DATA WAREHOUSE:
Data warehousing is a unique technique that helps collect and manage data from
various sources. A data warehouse is where data can be collected for mining
purposes, usually with large storage capacity. Various organizations’ systems are
in the data warehouse, where it can be fetched as per usage.

Source Extract Transform Load Target.


(Data warehouse process)

Data warehouses collaborate data from several sources and ensure data accuracy,
quality, and consistency. In a data warehouse, data is sorted into a formatted pattern
by type and as needed. The data is examined by query tools using several patterns.

Data warehouses store historical data and handle requests faster, helping in online
analytical processing, whereas a database is used to store current transactions in a
business process that is called online transaction processing.

# A Data Warehouse provides integrated, enterprise-wide, historical data and


focuses on providing support for decision-makers for data modelling and analysis.
FEATURES OF DATA WAREHOUSES:
 Time - Variant
The data collected in a data warehouse is identified with a specific period. It means
data warehouse has to contain historical data, not just current values.
1.) Allow the analysis of past.
2.) Relate information to the present
3.) Enables forecast to future

 Integrated:
Different sources are put together to build a data warehouse, such as level documents
or social databases. Data in Data warehouse comes from several operational system.
Before data integration some steps are follows:--

1.) Remove inconsistency

2.) Transformation

3.) Integration of source data

 Non-volatile:
This means the earlier data is not deleted when new data is added to the data
warehouse. Data granularity The operational database and data warehouse are kept
separate and thus continuous changes in the operational database are not shown in
the data warehouse.

 Subject Oriented
It provides you with important data about a specific subject like suppliers, products,
promotion, customers, etc. Data warehousing usually handles the analysis and
modelling of data that assist any organization to make data-driven decisions.
Applications of Data Warehouses:

 Banking Services
 Consumer Goods
 Manufacturing
 Financial Services
 Retail Sectors
Benefits of Data Warehousing

The most compelling advantages of data warehousing are:

 Improved performance and productivity


 Cost-effective
 Consistent and accurate data access

What is Data Mining?


In this process, data is extracted and analysed to fetch useful information. In data
mining hidden patterns are researched from the dataset to predict future behaviour.
Data mining is used to indicate and discover relationships through the data.

Data mining uses statistics, artificial intelligence, machine learning systems, and
some databases to find hidden patterns in the data. It supports business-related
queries that are time-consuming to resolve.

Features of Data Mining

Some of the unique features of data mining are:

 It is capable of predicting future results


 It can efficiently handle large datasets and databases
 It can seamlessly utilise the automated discovery of patterns
 It has the potential to create actionable insights, etc.
Applications of Data Mining

 Research
 Education Sector
 Transportation
 Market Basket Analysis
 Business Transactions
 Intrusion Detection
 Scientific Analysis
 Finance and Banking Sector
 Insurance and Healthcare

Benefits of Data Mining

 Analysing trends within the existing marketplace.


 Detecting frauds in phone calls, insurance claims, debit or credit
purchases, etc.
 It can make easy predictions within the market before making business
decisions.

Difference Between Data Mining and Data Warehousing

Data Mining Data Warehousing


This procedure involves analysing It is exquisitely designed for analytical
data patterns analysis
Regular data analysis Periodical data storage
Uses pattern recognition logic to Extracts and stores data to enable easy
identify patterns reporting
It is carried out by business users with
It is carried out by engineers
the help of engineers
It helps in extracting data from large
It pools all the relevant data together
data sets
Statistics, AI, Machine Learning, and Integrated, subject-oriented, non-volatile,
Databases are used in data mining and time-varying constitute data
technologies warehouses
Pattern recognition logic is used for It involves extracting and storing data in
determining patterns perfect order to make efficient reporting
Extracts are stores data in an orderly
Employs pattern recognition tools to
format, thereby making reporting faster
help identify the access patterns
and easier
When connected with operational
It helps in creating suggestive patterns
business systems like CRM, it deliberately
of key parameters
adds value to it

Common Tools and Software Used in Data Warehousing and Data Mining

Let’s look at the common tools and software used in data warehousing and data
mining: Some of the popular data warehouse tools are:

 Amazon Redshift
 Microsoft Azure
 Google BigQuery
 Snowflake
 Micro Focus Vertica
 Amazon DynamoDB
Some of the popular data mining tools are:

 RapidMiner
 MonkeyLearn
 IBM SPSS Modeler
 Oracle Data Mining
 Knime
 Weka
 Orange
 H2O
 Apache Mahout
 SAS Enterprise Miner

Common Data Mining and Data Warehousing Techniques

Let’s look at the common techniques used in data warehousing vs data mining:
The most common techniques of data mining are:

 Association
 Clustering
 Data Visualisation
 Data Cleaning
 Machine Learning
 Classification
 Neural Networks
 Prediction
 Data Warehousing
 Outlier Detection
Common Data Warehousing Techniques

Some of the most common data warehousing techniques are:

 Database Compression
 Columnar Data Storage
 In-Memory Processing
 Massive Parallel Processing (MPP)
Scope of Data Mining & Data Warehouse

The scope of data mining vs data warehousing is different from each other. Data
mining involves sorting enormous data sets to identify relationships and patterns
that can easily solve business problems through data analysis. The scope and
techniques of data mining enable enterprises to predict future trends and make
informed business decisions.
On the other hand, the scope of data warehousing lies within any domain that has
something to do with analytics. Now, let us discuss the differences between data
mining and data warehousing challenges faced.

Challenges of Data Mining & Data Warehousing

Some of the most common challenges of data mining vs data warehouse


challenges:
Some of the most common challenges experienced by data mining are:

 Incomplete and noisy data


 Social and security challenges
 Complex data
 Distributed data
 Efficiency and scalability of algorithms
 Performance
 Incorporating background knowledge
 Improving mining algorithms, etc.
Some of the most common challenges experienced by data warehousing are:

 Manual data processing


 Data quality
 Data Accuracy
 Testing
 Performance, etc.

You might also like