0% found this document useful (0 votes)

28 views15 pages

Data Warehousing and Mining Unit 1

Uploaded by

springboot513

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views15 pages

Data Warehousing and Mining Unit 1

Uploaded by

springboot513

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

DATA WAREHOUSING AND DATA

B
MINING

U
S
[AKTU]

Y4
D
STUDY4SUB

U
ST UNIT 1
DATA WAREHOUSING AND ITS KEY FEATURES:
Data warehousing is a centralized repository for storing and managing large amounts of data from
various sources for analysis and reporting. It involves transforming and integrating data into a unified,
organized, and consistent format. The key features of data warehousing include:
• Subject-Oriented: Data warehouses focus on specific themes like sales, marketing, or distribution,
providing information about a particular subject rather than overall operations.
• Integrated: Data warehouses integrate data from different sources into a reliable format, ensuring

B
consistency in naming conventions, format, and codes.

S U
• Time-Variant: Data in a warehouse is maintained over different time intervals, allowing for historical

Y4
analysis and comparisons over time.

D
STUDY4SUB
• Non-Volatile: Data in a warehouse is permanent and does not change, preserving historical information

U
for analysis and decision-making

ST
COMPONENTS OF DATA WAREHOUSE
1.Operational Source: Data source for the warehouse, such as operational databases or external data.
2.Load Manager: Responsible for extracting and loading data into the warehouse, including data
transformation.
3.Warehouse Manager: Manages warehouse processes, such as data analysis, aggregation, backup,
2
collection, and denormalization.
3. Query Manager: Manages user queries within the data warehouse system.
4. ETL Tools: Extract data from various sources, transform it to fit the warehouse schema, and load it
into the warehouse.
5. Central Database: Stores all business data in the warehouse, making it easier for reporting and
analysi.
6. Access Tools: Enable users to access and interact with the data stored in the warehouse for analysis

B
and reporting.

U
7. Metadata: Data about the data stored in the warehouse, used for extraction, loading processes,

S
warehouse management, and query management.

Y4
8. Data Staging Area: Prepares extracted data for storage in the warehouse by cleaning, transforming,

D
STUDY4SUB

U
and standardizing it.

ST
9. Information Delivery Component: Enables users to access and subscribe to data from the warehouse
for analysis and reporting.
11. Data Marts: Subsets of corporate-wide data tailored for specific user groups or subjects, providing
focused data for analysis.
12. Management and Control Component: Coordinates services within the data warehouse, controlling
data transformation, transfer, and delivery to clients.
3
B
SU
Y4
D
STUDY4SUB

U
ST

4
CONCEPT OF BUILDING A DATA WAREHOUSE
1.Business Needs: Identify the organization's needs and goals.
2.Data Sources: Identify data sources and structure.
3.Storage: Choose physical or cloud-based servers.
4.Software: Data warehousing software processes and manages data.

B
5.Labor: Backend developers, architects, analysts, and managers.

U
6.Implementation: Use data warehouse for data analytics.

S
Y4
7.Benefits: Save time.

D
• Boost confidence in data. STUDY4SUB

U
ST
• Increase insights.
• Improve security.

5
STEPS INVOLVED IN MAPPING THE DATA WAREHOUSE TO A MULTIPROCESSOR ARCHITECTURE
1. The steps involved in mapping the data warehouse to a multiprocessor architecture include:Relational
database technology for data warehouse: This involves understanding the basics of relational databases
and their role in data warehousing.
2. Types of parallelism: There are two types of parallelism - inter query parallelism and intra query parallelism.
Inter query parallelism involves handling multiple requests at the same time, while intra query parallelism
decomposes a serial SQL query into lower level operations and executes them concurrently.
3. Data partitioning: This is a key component for effective parallel execution of database operations.
Partitioning can be done randomly or intelligently, with options such as random data striping, round robin

B
fashion partitioning, hash partitioning, key range partitioning, schema portioning, and user-defined

U
portioning.

S
4. Database architectures of parallel processing: There are three DBMS software architecture styles for parallel

Y4
processing - shared memory or shared everything architecture, shared disk architecture, and shred nothing
architecture.

D
STUDY4SUB
5. Parallel DBMS features: These include optimizer implementation, application transparency, parallel

U
environment, and DBMS management tools.

ST
6. Alternative technologies: These include advanced database indexing products, multidimensional databases,
and specialized RDBMS.
7. Parallel DBMS vendors: These include Oracle, Informix, IBM, and SYBASE, each with their own
architecture, data partitioning, and parallel operations.
8. Specialized database products: These include Red Brick Systems, White Cross System, and other RDBMS
products.
• The mapping process involves choosing the appropriate architecture style, partitioning the data effectively,
and6 implementing parallel processing features to improve performance and scalability.
STRATEGIES TO BE TAKEN CARE WHILE DESIGNING A WAREHOUSE
1. Analyze current layout and processes
2. Identify opportunities for improvement
3. Select appropriate design type
4. Test and refine the layout
5. Consider workforce
6. Optimize storage space

B
7. Ensure smooth traffic flow

U
8. Implement safety protocols

S
9. Utilize warehouse management software

Y4
10.Incorporate automation

D
11.Maintain workflow and access
STUDY4SUB

U
DIFFERENCE BETWEEN DATA WAREHOUSE AND DATABASE

ST
Database:
• Stores real-time information for specific applications.
• Handles daily transactions efficiently.
• Uses Online Transactional Processing (OLTP) for quick CRUD operations.
• Structured with normalized data for maximum efficiency.
• Contains current, up-to-date information.

7
Data warehouse
• Gathers historical data from various sources for analysis.
• Supports complex queries for strategic decision-making.
• Uses Online Analytical Processing (OLAP) for in-depth analysis.
• Denormalized structure for faster retrieval of data.
• Stores historical data for business insights and reporting.
DIFFERENCE BETWEEN OLAP AND OLTP

B
OLTP (Online Transaction Processing):

S U
Designed for managing day-to-day transactions.

Y4
• Optimized for inserting, updating, and deleting small amounts of data quickly and efficiently.

D
• Normalized structure for efficient data processing.
STUDY4SUB

U
• Typically stores current operational data.

ST
OLAP (Online Analytical Processing):Designed for complex data analysis.
• Optimized for handling complex queries that involve large data sets.
• Denormalized structure for faster retrieval of data.
• Stores historical data from various databases.
• Enables in-depth data analysis across multiple dimensions.
• Supports decision-making and problem-solving.
8
WHAT IS METADATA
Metadata is data about data, providing information about the content, context, and structure of data. It is important for data
management, organization, and analysis, and can include descriptive, structural, and administrative metadata. Metadata
can pose privacy and security risks if left unchecked, and should be managed carefully to ensure data safety and privacy.
IMPORTANCE OF METADATA
• Provides context and makes data unique.
• Encourages reuse and ensures interoperability.
• Supports effective data governance and improves data quality.

B
• Enables data discoverability and better decision-making.

S U
• Ensures compliance and interoperability between datasets.

Y4
• Facilitates collaboration and time/efficiency savings.

D
WHAT IS MULTIDIMENSIONAL DATA
STUDY4SUB

U
Definition: Data organized into dimensions for complex analysis.

ST
Usage: Common in data warehousing and business intelligence.
• Features: Enables analysis across multiple dimensions.
• Supports complex queries and insights.
• Includes descriptive and structural metadata.
• Facilitates drilling down into data.
• Enhances user-friendly data analysis interfaces.
9
Concept Hierarchy
• Definition: Organizing concepts into a hierarchical structure.
• Purpose: Helps categorize and classify concepts from broader to narrower levels.
• Example: In history, organizing historical periods from ancient to modern.
• Importance: Facilitates understanding, analysis, and research by placing concepts in a structured
context.

B
S U
Y4
D
WHAT IS GRANULARITY STUDY4SUB

U
ST
Definition: Level of detail or resolution at which data is stored and analyzed.
Importance: Affects volume of data, efficiency of data shipping, and types of analysis.
Example: Time-series data can have granularity based on years, months, weeks, days, or hours.
Optimal Level: Somewhere in the middle, providing exact segmentation without being too precise.
Purpose: Enables detailed segmentation and targeting in marketing and software.
10
WHAT IS PARTITIONING
Definition: Dividing a large database or table into smaller, more manageable pieces called partitions.
Purpose: Improving performance, manageability, availability, or load balancing.
Types: Horizontal and vertical partitioning.
Methods: Range, list, composite, and hash partitioning.
Key Feature: Scalability, manageability, and performance in data warehousing.

B
U
Transparency: SQL statements.

S
Benefits: Improves availability and security of distributed database management systems, allowing local

Y4
transactions to be performed on individual partitions.

D
WHAT IS STAR SCHEMA STUDY4SUB

U
ST
Definition: A multi-dimensional data model used to organize data in a database for easy understanding
and analysis.
• Components: Fact table (central) and dimension tables (connected to fact table).
• Purpose: Optimized for querying large data sets, improving analytical query performance, and
simplifying queries.
• Benefits: Improved query performance, fast aggregations, simplified business reporting logic, and
feeding cubes.
11
SNOWFLAKE SCHEMA
• Definition: A logical arrangement of tables in a multidimensional database where dimensions are
normalized into multiple related tables.
• Structure: Centralized fact tables connected to multiple dimensions, with dimensions normalized into
multiple related tables.
• Purpose: Normalization of dimension tables by removing low cardinality attributes and forming a
hierarchical structure.

B
U
• Comparison to Star Schema: Dimensions are normalized into multiple related tables, creating a

S
snowflake structure, unlike the denormalized dimensions in a star schema.

Y4
D
STUDY4SUB

U
ST

12
B
SU
Y4
D
STUDY4SUB

U
ST

13
WHAT IS FACT CONSTELLATION
• Definition: Data warehouse schema with multiple fact tables sharing dimensions.
• Also Known As: Galaxy schema.
• Structure: Multiple fact tables sharing one or more dimensions.
• Purpose: Allows for complex relationships between data.
• Challenges: More difficult to implement and maintain due to complexity.

B
S U
Y4
D
STUDY4SUB

U
ST

14
B
SU
Y4
D
STUDY4SUB

U
ST

Unit-2 PPT DWDM r20
No ratings yet
Unit-2 PPT DWDM r20
111 pages
7931 Ecap446 Data Warehousing and Data Mining
No ratings yet
7931 Ecap446 Data Warehousing and Data Mining
251 pages
Migration Plan To SAP S4 HANA Finance
67% (3)
Migration Plan To SAP S4 HANA Finance
15 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
Unit 2 V.k.wani
No ratings yet
Unit 2 V.k.wani
40 pages
UNIT - 1 - Datawarehouse & Data Mining
100% (1)
UNIT - 1 - Datawarehouse & Data Mining
24 pages
INFORMATION MANAGEMENT Unit 3 NEW
100% (1)
INFORMATION MANAGEMENT Unit 3 NEW
61 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
Do 254 HVP - WP PDF
0% (1)
Do 254 HVP - WP PDF
11 pages
Data Mining
No ratings yet
Data Mining
98 pages
Data Warehousing and Mining Complete Notes
No ratings yet
Data Warehousing and Mining Complete Notes
495 pages
DataMining and Data Warehousing
No ratings yet
DataMining and Data Warehousing
96 pages
Oracle Cloud Infrastructure (OCI) Architect Associate Exam (1Z0-932) Study Guide
No ratings yet
Oracle Cloud Infrastructure (OCI) Architect Associate Exam (1Z0-932) Study Guide
9 pages
Data Warehouse
No ratings yet
Data Warehouse
143 pages
Unit-I DW - Architecture
100% (1)
Unit-I DW - Architecture
96 pages
Decap446 Data Warehousing and Data Mining
No ratings yet
Decap446 Data Warehousing and Data Mining
252 pages
DWDM Combine
No ratings yet
DWDM Combine
97 pages
04OLAP
No ratings yet
04OLAP
48 pages
04OLAP Editted v1
No ratings yet
04OLAP Editted v1
59 pages
Warehouse
No ratings yet
Warehouse
60 pages
1 & 2 Data Warehousing - 021052
No ratings yet
1 & 2 Data Warehousing - 021052
80 pages
Data Warehousing Introduction Pages 2 53
No ratings yet
Data Warehousing Introduction Pages 2 53
52 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
04olap New
No ratings yet
04olap New
55 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
Unit 1
No ratings yet
Unit 1
60 pages
Data Warehouse
No ratings yet
Data Warehouse
39 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
41 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Data Mining and Warehousing: Kapil Sharma
No ratings yet
Data Mining and Warehousing: Kapil Sharma
55 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
Module-3 Data Warehousing
No ratings yet
Module-3 Data Warehousing
44 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
Data Warehouse Power Point Presentation
No ratings yet
Data Warehouse Power Point Presentation
18 pages
04OLAP
No ratings yet
04OLAP
66 pages
Unit I
No ratings yet
Unit I
36 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
Csb4318 DWDM Unit - 1 Revised
No ratings yet
Csb4318 DWDM Unit - 1 Revised
68 pages
Host Script Samples
100% (7)
Host Script Samples
4 pages
04DWH & Olap
No ratings yet
04DWH & Olap
50 pages
3.1 What Is Data Warehouse?: Unit Iii
No ratings yet
3.1 What Is Data Warehouse?: Unit Iii
33 pages
Ralph Patterson Speed Reducer 2
No ratings yet
Ralph Patterson Speed Reducer 2
27 pages
Action Script 1 Flash 8
100% (3)
Action Script 1 Flash 8
5 pages
DW Unit 1
No ratings yet
DW Unit 1
29 pages
04OLAP
No ratings yet
04OLAP
50 pages
DWDM Lecture Notes
No ratings yet
DWDM Lecture Notes
139 pages
DW Unit I Notes
No ratings yet
DW Unit I Notes
28 pages
Data Warehousing - CH2
No ratings yet
Data Warehousing - CH2
26 pages
Lect 5 Data Warehousing I - 240924 - 033406
No ratings yet
Lect 5 Data Warehousing I - 240924 - 033406
38 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
1.1 Basic Concepts & Architecture
No ratings yet
1.1 Basic Concepts & Architecture
27 pages
An Overview of Data Warehousing and OLAP Technology: Presented by Manish Desai
No ratings yet
An Overview of Data Warehousing and OLAP Technology: Presented by Manish Desai
17 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
2025-Handouts - OLAP - Lecture 1
No ratings yet
2025-Handouts - OLAP - Lecture 1
10 pages
DWDM
No ratings yet
DWDM
15 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
Course Overview: What Is Data Warehouse
No ratings yet
Course Overview: What Is Data Warehouse
75 pages
DM Chapter 4
No ratings yet
DM Chapter 4
8 pages
Big Query
No ratings yet
Big Query
8 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
L5 DataWarehousing
No ratings yet
L5 DataWarehousing
13 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
F
No ratings yet
F
1 page
Cost Behavior Analysis and Use
No ratings yet
Cost Behavior Analysis and Use
3 pages
TypeFocus Sample Report
No ratings yet
TypeFocus Sample Report
7 pages
Classic Papers in Programming Languages and Logic
No ratings yet
Classic Papers in Programming Languages and Logic
863 pages
Unit 2
No ratings yet
Unit 2
5 pages
Unit 5 Notes On Ajp 22517 Part 1
No ratings yet
Unit 5 Notes On Ajp 22517 Part 1
6 pages
Quail Eggs Incubator
No ratings yet
Quail Eggs Incubator
5 pages
Information Booklet Big Bang
No ratings yet
Information Booklet Big Bang
8 pages
How To Learn To Code
No ratings yet
How To Learn To Code
3 pages
Yamaha PA Full-Line 2018 Global EN PDF
No ratings yet
Yamaha PA Full-Line 2018 Global EN PDF
239 pages
WCE700 MX51 ER 1106 ReferenceManual
No ratings yet
WCE700 MX51 ER 1106 ReferenceManual
245 pages
Cycle Count1
No ratings yet
Cycle Count1
2 pages
A Multi-View Feature Fusion Approach For Effective Malware Classification Using Deep Learning
No ratings yet
A Multi-View Feature Fusion Approach For Effective Malware Classification Using Deep Learning
15 pages
Microsoft Azure Cloud Fundamentals AZ 90
No ratings yet
Microsoft Azure Cloud Fundamentals AZ 90
62 pages
HPE Ezmeral Data Fabric Database-A00125063enw
No ratings yet
HPE Ezmeral Data Fabric Database-A00125063enw
16 pages
Et200pa Smart en en-US
No ratings yet
Et200pa Smart en en-US
266 pages
Background Remover Free - Remove BG For Free Online
No ratings yet
Background Remover Free - Remove BG For Free Online
1 page
Data Structure Tree: DR Mourad Raafat
No ratings yet
Data Structure Tree: DR Mourad Raafat
21 pages
21-22 - Knowledge Systems - Expert Systems, Recommenders, Chatbots, Virtual Personal Assistants, and Robo Advisors
No ratings yet
21-22 - Knowledge Systems - Expert Systems, Recommenders, Chatbots, Virtual Personal Assistants, and Robo Advisors
46 pages
Synopsis G34
No ratings yet
Synopsis G34
8 pages
AI and DS
No ratings yet
AI and DS
6 pages
Lecture 8
No ratings yet
Lecture 8
8 pages
Mid Term Exam Questioner
No ratings yet
Mid Term Exam Questioner
4 pages
SG 4 VPN
No ratings yet
SG 4 VPN
2 pages
Fall Semester 2021-22 CSE1007 - Java Programming Lab Practice Problems On Threads and Exceptions
No ratings yet
Fall Semester 2021-22 CSE1007 - Java Programming Lab Practice Problems On Threads and Exceptions
2 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
From Everand
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Data Warehousing and Mining Unit 1

Uploaded by

Data Warehousing and Mining Unit 1

Uploaded by

DATA WAREHOUSING AND DATA

You might also like