Data Warehousing and Data Mining
Data Warehousing and Data Mining
DATA WAREHOUSE
A data warehouse is a logical collection of information, gathered from many different databases. Thus
data warehouse may be called as a large database containing historical transactions and other data.
For example – if we take department store dealing in buying and selling grocery items. The data ware
house would deal with granular data, information in its rawest form, within data ware house, each
transaction may be recorded.
The PURPOSE OF DATA WAREHOUSE is permanent storage of detailed information. Data entered into
a data warehouse needs to be processed to ensure that it is clean, complete and in a proper format.
Many a times, a data warehouse is subdivided in to smaller repositories called ‘Data Marts.’ A data
mart is a subset of a data warehouse, in which only the required portion of the data warehouse
information is kept.
1. SUBJECT-ORIENTED
It focuses on modeling and analysis of data relating to a specific area. The data warehouse is
organized around subject such as product, customer, sales etc.
2. INTEGRATED
It is an integration of data from various different applications like ERP systems, CRM system etc.
3. HISTORICAL PERSPECTIVE
The time variant for a data warehouse has a historical perspective in its approach, For example – past 5-
10 years.
4. NON-VOLATILE
It means data is stored permanently i.e. data once stored cannot be updated
Data warehouses are capable of storing vast quantities of data, but there is a challenge in implementing
data warehousing applications. For successful implementation, organizations need to be very careful
about the data quality. Missing and miscoded data has to be cleaned up, and variables often come in a
variety of types, such as nominal data with no numeric content, dates, counts, averages etc.
Thus, organizations must ensure the data quality in a data warehouse. To make data warehouses useful,
organizations must use BI (business intelligence) tools to process data into meaningful information.
These databases are used for data mining and online analytical processing (OLAP)
The organizations that develop business intelligence (BI) tools create interfaces that help the managers
to quickly grasp business situations. Such an interface is simple to understand and the interpretation by
the managers becomes easy. Example – one such interface is called dash board ,because it looks similar
to a car dash board visual images like speedometer – like indicators for periodic revenues, profits, and
other financial information ;plus bar charts, line graphs, and other graphical representations are used in
dashboards.
DATA MINING
Definition
It is defined as a process used to extract usable data from a larger set of any raw data.
Data mining queries are more advanced and sophisticated than those of traditional queries.
For example – a typical traditional query may be” what is the relationship between the amount of
product A and the amount of product B that an organization sold over the past week?”.
Where as in Data Mining, the manager would be interested to know the products that would be in
demand on the coming weekend and thus the query from the data mining may be” find out the
products most likely to have the maximum demand on the coming weekend.”
The combination of data-warehousing techniques and data mining software makes it easier to predict
future outcomes based on patterns discovered within historical data.
1. SEQUENCE / PATH ANALYSIS - Finding patterns where one event leads to another.
3. DATA SELECTION – data relevant to the analysis task are retrieved from the database.
4. DATA TRANSFROMATION – data area transformed into forms appropriate for mining by performing
summary or aggregation operations.
5. DATA MINING – process where intelligent methods are applied in order to extract data patterns.
6. PATTERN EVALUATION – to identify the truly interesting patterns representing knowledge based on
some interestingness measure. patterns are selected on interestingness basis.
1. Retail or marketing
2. Banking