0% found this document useful (0 votes)

10 views6 pages

DW Assignment

The document outlines the components of a data warehouse, including the central database, ETL tools, metadata, and access tools, which work together to facilitate efficient data storage and analysis. It also explains the data mining process as a crucial step in Knowledge Discovery in Databases (KDD), detailing stages such as data selection, pre-processing, transformation, mining, evaluation, and presentation. Overall, the document emphasizes the importance of these components and processes in supporting decision-making and extracting valuable insights from data.

Uploaded by

preethirajendraprasath2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

DW Assignment

Uploaded by

preethirajendraprasath2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DATA WAREHOUSING

COMPONENTS OF DATA WAREHOUSE

A data warehouse is a large, centralized repository of integrated data that

supports decision-making processes through analysis, reporting, and querying.
The components of a data warehouse work together to facilitate efficient data
storage, processing, and retrieval. Here's an overview of the primary components
in a data warehousing architecture:

1. Central Database (Data Warehouse Database):

The central database is the core of the data warehouse, where all the cleaned
and integrated data is stored for analysis. It typically uses a relational database
management system (RDBMS) or other optimized storage systems for querying
large volumes of data. This database is structured to support fast querying and
reporting and is designed to handle historical data.

• Data Organization: Data in the central database is often organized using a

star schema or snowflake schema, where facts (measurable data points)
are related to dimensions (contextual attributes) for easy analysis.
• Data Storage: The database stores historical data that can be queried for
trends, comparisons, and insights over time. It also stores aggregated
data, which helps in reducing query response times.
• Data Types: The data might include transactional data, log files, business
data from various sources, and external data like market research.
2. ETL Tools (Extract, Transform, Load):

ETL tools are used to extract data from various source systems, transform it into
a usable format, and then load it into the data warehouse.

• Extract: The extraction phase involves pulling data from diverse sources
such as transactional databases, flat files, spreadsheets, web logs, external
sources, etc.
• Transform: The transformation step cleanses and reshapes the data, such
as removing duplicates, correcting inconsistencies, mapping data to the
correct schema, and applying business rules. This ensures the data is
accurate and aligned with the data warehouse's schema.
• Load: The transformed data is then loaded into the data warehouse's
central database. This process could be done in batches (periodically) or in
real-time, depending on the architecture of the data warehouse.

ETL tools can also handle complex transformations like data aggregation, data
enrichment, and data validation. Popular ETL tools include Informatica,
Talend, Apache Nifi, and Microsoft SQL Server Integration Services (SSIS).

3. Metadata:

Metadata is data that describes other data. In the context of a data warehouse,
metadata provides critical information about the data stored within the system
and is essential for effective data management and usability.

There are three main types of metadata in data warehousing:

• Business Metadata: Describes the business context of the data, including

business definitions, metrics, calculations, and data ownership. For
instance, what does "Revenue" mean in a specific dataset? How is it
calculated?
• Technical Metadata: Provides details about how the data is structured,
such as table names, column names, data types, indexes, and relationships.
It also includes information about data sources, data lineage (origin of
data), and transformation logic.
• Operational Metadata: Includes information related to the processing of
the data, such as ETL job statuses, load times, refresh schedules, and error
logs. This helps track how and when the data is loaded or refreshed.

Metadata management is crucial because it ensures users can understand,

interpret, and trust the data in the warehouse. It helps users navigate the
complex datasets, making the data warehouse more user-friendly and effective.
4. Access Tools:

Access tools are interfaces that allow end-users to interact with the data
warehouse and retrieve the information they need for analysis. These tools
include:

• Query Tools: Tools that allow users to directly query the data warehouse
to retrieve insights. These can include SQL-based query tools or more
visual interfaces. For instance, tools like SQL Server Management Studio
or Oracle SQL Developer allow users to write complex SQL queries to
analyse the data.
• OLAP (Online Analytical Processing) Tools: OLAP tools provide a
multidimensional view of the data, allowing users to perform complex
analyses such as drill-down, roll-up, slicing, and dicing. Examples
include Microsoft SQL Server Analysis Services (SSAS), IBM Cognos,
and SAP BW.
• Business Intelligence (BI) Tools: BI tools offer a graphical interface that
allows business users to create reports, dashboards, and visualizations
without needing deep technical expertise. They connect to the data
warehouse to provide decision-makers with actionable insights. Examples
include Tableau, Power BI, QlikView, and Looker.
• Data Mining Tools: These tools help to discover patterns and trends in
large datasets using statistical algorithms, machine learning models, or AI
techniques. Data mining can identify correlations, trends, anomalies, and
forecasts within the data warehouse.

Key Interactions:

• ETL tools populate the central database with clean and structured data
from various sources.
• The metadata layer ensures that users understand the meaning and
lineage of the data within the warehouse, facilitating trust and effective
analysis.
• Access tools enable users (both technical and non-technical) to query the
data warehouse and extract actionable insights, helping businesses make
informed decisions.

Conclusion:

A well-designed data warehouse integrates these components effectively to

ensure that data is not only stored efficiently but is also accessible,
understandable, and usable for analysis. By utilizing a central database, ETL
tools, metadata, and access tools, organizations can gain valuable insights from
their data, supporting decision-making and strategic planning.
DATA MINING AS STEP IN PROCESS OF KNOWLEDGE DISCOVERY

Knowledge Discovery in Databases (KDD) is the overall process of

discovering useful knowledge from large datasets. Data mining is a crucial step in
this process, where patterns and knowledge are extracted from the data using
algorithms and statistical techniques. To understand the role of data mining, it's
essential to place it in the context of the full KDD process, which involves
multiple stages.

1. Data Selection:

• Objective: In this initial step, relevant data is selected from various

sources. The data selected should be directly related to the problem or
query at hand.
• Example: A retail company might select data from customer transactions,
demographics, and browsing history to analyse purchasing behaviour.

2. Data Pre-processing:

• Objective: Data pre-processing cleans and prepares the data for analysis.
Raw data is often noisy, incomplete, or inconsistent. This step involves
handling missing values, eliminating outliers, and correcting errors.
• Techniques: Data cleaning (e.g., handling missing values), data
transformation (e.g., normalization, standardization), and noise reduction.
• Example: Removing duplicate records, replacing missing data with
averages, or transforming data into a consistent format (e.g., converting
date formats).

3. Data Transformation:

• Objective: In this step, the data is transformed into a format suitable for
mining. It might involve aggregation, normalization, or dimensionality
reduction to simplify the analysis.
• Techniques: Feature extraction, dimensionality reduction (e.g., Principal
Component Analysis), and data encoding.
• Example: Converting a date column into multiple features like day, month,
and year or creating a new feature that aggregates customer purchases
into a total spend.
4. Data Mining (Core Step):

• Objective: Data mining is the heart of the KDD process. It is the step
where patterns, trends, correlations, and structures in the data are
identified using sophisticated algorithms and techniques. The goal is to
extract meaningful knowledge from large datasets.
• Techniques:
o Classification: Grouping data into predefined categories (e.g.,
classifying emails as spam or not spam).
o Clustering: Grouping similar data points together (e.g., customer
segmentation based on purchasing behaviour).
o Association Rule Mining: Identifying relationships between
variables in the data (e.g., "Customers who buy milk are also likely
to buy bread").
o Regression Analysis: Predicting a continuous value (e.g., predicting
future sales based on past data).
o Anomaly Detection: Identifying unusual patterns (e.g., fraud
detection).
• Example: A retail store uses association rule mining to find that
customers who buy laptops are likely to buy laptop accessories, helping
them optimize product placement.

5. Evaluation:

• Objective: After mining the data, the next step is to evaluate the patterns
and models discovered during the data mining phase. This ensures the
results are valid, useful, and meet the business objectives. Evaluation
checks the quality of the results to ensure they make sense and align with
the goals.
• Example: If a classification model is created, its performance can be
evaluated by checking its accuracy, precision, recall, or F1-score against a
test dataset to verify its prediction capabilities.

6. Knowledge Presentation:

• Objective: The final step in the KDD process is presenting the knowledge
in a user-friendly format, such as visualizations, reports, or dashboards.
This makes it easier for stakeholders to understand the insights and make
informed decisions.
• Example: Visualizing customer segments in a graph or displaying
association rules in a table for marketing teams to act on.

Summary: Data Mining in the KDD Process

• Data mining occurs after data selection, pre-processing, and

transformation, and before evaluation and presentation.
• In data mining, sophisticated algorithms are applied to discover patterns,
relationships, and predictive models in the data.
• It is the stage where raw data is analysed using methods like classification,
clustering, and association rule mining to uncover valuable insights.
• After data mining, the results are evaluated for quality and presented in
a way that stakeholders can use to make business decisions.

Conclusion

Data mining is the core step in the Knowledge Discovery in Databases (KDD)
process, where raw, pre-processed, and transformed data is analysed to extract
meaningful patterns and relationships. This knowledge can then be evaluated,
presented, and used to support decision-making. Without data mining, the KDD
process wouldn't be able to uncover valuable insights that drive business
intelligence and strategic actions.

Property Management System
100% (10)
Property Management System
104 pages
Petrel 2014 Installation Guide
100% (3)
Petrel 2014 Installation Guide
92 pages
Data Mining and Warehousing Lecture-1,2
No ratings yet
Data Mining and Warehousing Lecture-1,2
37 pages
Data Mining Unit-I
No ratings yet
Data Mining Unit-I
11 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
36 pages
DATA MINING - UNIT 1s
No ratings yet
DATA MINING - UNIT 1s
43 pages
DWDM Lecture Notes U-1
No ratings yet
DWDM Lecture Notes U-1
11 pages
DWDM202
No ratings yet
DWDM202
6 pages
Unit 1
No ratings yet
Unit 1
39 pages
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
No ratings yet
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
54 pages
DWDM 2 Unit Notes
No ratings yet
DWDM 2 Unit Notes
14 pages
A Paper Presentation On: - Information Repository With Knowledge Discovery
No ratings yet
A Paper Presentation On: - Information Repository With Knowledge Discovery
23 pages
Introduction To Data Mining and Data Warehousing
No ratings yet
Introduction To Data Mining and Data Warehousing
2 pages
Data Warehousing and Data Mining Final Year Seminar Topic
No ratings yet
Data Warehousing and Data Mining Final Year Seminar Topic
10 pages
DWDM Lecture Notes III-II
No ratings yet
DWDM Lecture Notes III-II
81 pages
Data Notes
No ratings yet
Data Notes
37 pages
01 Data Warehouse
No ratings yet
01 Data Warehouse
15 pages
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
DM & W SQ
No ratings yet
DM & W SQ
15 pages
DWDM 5 Unit Notes
No ratings yet
DWDM 5 Unit Notes
86 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
14 pages
DW Assigment
No ratings yet
DW Assigment
20 pages
Data Warehousing
No ratings yet
Data Warehousing
8 pages
DWDM External
No ratings yet
DWDM External
30 pages
Data Warehousing and Data Mining Notes
No ratings yet
Data Warehousing and Data Mining Notes
37 pages
Ai Pass
No ratings yet
Ai Pass
12 pages
Business Analytics
No ratings yet
Business Analytics
27 pages
Lec09-Data Warehousing
No ratings yet
Lec09-Data Warehousing
32 pages
CS 2208 Data Mining and Warehousing Notes
No ratings yet
CS 2208 Data Mining and Warehousing Notes
14 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
108 pages
Unit 2
No ratings yet
Unit 2
19 pages
Important Questions
No ratings yet
Important Questions
26 pages
Report On Principles of Fragmentation in Computer Science
No ratings yet
Report On Principles of Fragmentation in Computer Science
26 pages
Unit 5 Endsem PYQS
No ratings yet
Unit 5 Endsem PYQS
16 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
17 pages
DW Unit I Notes
No ratings yet
DW Unit I Notes
28 pages
5, Data Warehousing
No ratings yet
5, Data Warehousing
16 pages
Advanced Database Presentation
No ratings yet
Advanced Database Presentation
11 pages
Bida Notes
No ratings yet
Bida Notes
67 pages
1 What Is Data Mining
No ratings yet
1 What Is Data Mining
9 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
DWDM
No ratings yet
DWDM
12 pages
Data Warehousing Mining
No ratings yet
Data Warehousing Mining
26 pages
DWDM Lecture Notes III-II
No ratings yet
DWDM Lecture Notes III-II
86 pages
Dmda Mid 1 Laqs
No ratings yet
Dmda Mid 1 Laqs
10 pages
Unit I
No ratings yet
Unit I
18 pages
DW Module-1
No ratings yet
DW Module-1
4 pages
DATA MINING Unit 1
No ratings yet
DATA MINING Unit 1
22 pages
Data Mining Display
No ratings yet
Data Mining Display
20 pages
Bit
No ratings yet
Bit
4 pages
Data Mining Cat
No ratings yet
Data Mining Cat
6 pages
Data Mining and Data Warehouse Study Material - Edited
No ratings yet
Data Mining and Data Warehouse Study Material - Edited
7 pages
Data Warehousing Mining 20APE0501 Min
No ratings yet
Data Warehousing Mining 20APE0501 Min
87 pages
Data Warehousing and DSS
No ratings yet
Data Warehousing and DSS
53 pages
Unit II Lecture Notes
No ratings yet
Unit II Lecture Notes
26 pages
Unit 1 - Introduction
No ratings yet
Unit 1 - Introduction
8 pages
Data Warehousing
No ratings yet
Data Warehousing
8 pages
Data Warehouse and Data Mining
No ratings yet
Data Warehouse and Data Mining
12 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
15 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
19 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
14 pages
Smart City
No ratings yet
Smart City
3 pages
Behavioral Learning Approach and Gender (1) 1
No ratings yet
Behavioral Learning Approach and Gender (1) 1
6 pages
Unit 4 Esiot
No ratings yet
Unit 4 Esiot
138 pages
Iwgs Two
No ratings yet
Iwgs Two
4 pages
Cs3691-Esiot (Preethi.r) Assignment
No ratings yet
Cs3691-Esiot (Preethi.r) Assignment
3 pages
Orders Positions and Deals
No ratings yet
Orders Positions and Deals
11 pages
Laravel
No ratings yet
Laravel
8 pages
Cits I T NSQF-6
No ratings yet
Cits I T NSQF-6
42 pages
Unit 1 Part-A: Department of Information Technology Question Bank - Even Semester
No ratings yet
Unit 1 Part-A: Department of Information Technology Question Bank - Even Semester
11 pages
PROII Data Transfer System User Guide
No ratings yet
PROII Data Transfer System User Guide
86 pages
Cyber ST 2 SOLUTION - Cyber Security
No ratings yet
Cyber ST 2 SOLUTION - Cyber Security
12 pages
NDOUtils DB Model PDF
No ratings yet
NDOUtils DB Model PDF
57 pages
Half Portion - 1 - XII - B
No ratings yet
Half Portion - 1 - XII - B
12 pages
project-II REVIEW 1 BATCH-1
No ratings yet
project-II REVIEW 1 BATCH-1
20 pages
Some of The Advantages of CAD Over Manual Drawing Are
No ratings yet
Some of The Advantages of CAD Over Manual Drawing Are
16 pages
FULL PROJECT (Docx) - Course Sidekick
No ratings yet
FULL PROJECT (Docx) - Course Sidekick
57 pages
Dbms FNL Report
No ratings yet
Dbms FNL Report
25 pages
Spring Multi-Tenant Architecture-Practical Implementation: August 2016
No ratings yet
Spring Multi-Tenant Architecture-Practical Implementation: August 2016
14 pages
IT2020 - Software Engineering
No ratings yet
IT2020 - Software Engineering
7 pages
Chapter 3 Database Design Process
No ratings yet
Chapter 3 Database Design Process
2 pages
GDM in Gemcom
No ratings yet
GDM in Gemcom
11 pages
ITSM Reference Architecture - ITSM - Whitepaper PDF
No ratings yet
ITSM Reference Architecture - ITSM - Whitepaper PDF
21 pages
Advance Database Management Systems
No ratings yet
Advance Database Management Systems
5 pages
Datastage Qa
No ratings yet
Datastage Qa
2 pages
PowerEdge Portfolio
No ratings yet
PowerEdge Portfolio
85 pages
Alias Attribute
No ratings yet
Alias Attribute
41 pages
Inspector of Legal Metrology Syllabus
No ratings yet
Inspector of Legal Metrology Syllabus
25 pages
Real-Time Construction Project Progress
No ratings yet
Real-Time Construction Project Progress
217 pages
What Is Artificial Intelligence
100% (4)
What Is Artificial Intelligence
23 pages
Conceptual Logical and Physical Data Model
No ratings yet
Conceptual Logical and Physical Data Model
4 pages
Chapter Four System Development and Implementation 4.1 Database Files Created
No ratings yet
Chapter Four System Development and Implementation 4.1 Database Files Created
10 pages
CSE & IT Time Table 2020-21 Even Sem WEF 15-03-21
No ratings yet
CSE & IT Time Table 2020-21 Even Sem WEF 15-03-21
46 pages
IBPS Specialist Officer Syllabus
No ratings yet
IBPS Specialist Officer Syllabus
2 pages

DW Assignment

Uploaded by

DW Assignment

Uploaded by

DATA WAREHOUSING

COMPONENTS OF DATA WAREHOUSE

A data warehouse is a large, centralized repository of integrated data that

1. Central Database (Data Warehouse Database):

• Data Organization: Data in the central database is often organized using a

There are three main types of metadata in data warehousing:

• Business Metadata: Describes the business context of the data, including

Metadata management is crucial because it ensures users can understand,

A well-designed data warehouse integrates these components effectively to

Knowledge Discovery in Databases (KDD) is the overall process of

• Objective: In this initial step, relevant data is selected from various

Summary: Data Mining in the KDD Process

• Data mining occurs after data selection, pre-processing, and

You might also like