0% found this document useful (0 votes)
6 views

VV_Data Warehousing and Data Mining

A data warehouse is a centralized repository for storing and managing large amounts of data from various sources, enhancing analysis and reporting for improved business intelligence. It provides benefits such as data security, scalability, and access to historical insights, facilitating data-driven decision-making. The document also discusses metadata, data modeling, and the importance of aggregate tables in optimizing query performance.

Uploaded by

p bb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

VV_Data Warehousing and Data Mining

A data warehouse is a centralized repository for storing and managing large amounts of data from various sources, enhancing analysis and reporting for improved business intelligence. It provides benefits such as data security, scalability, and access to historical insights, facilitating data-driven decision-making. The document also discusses metadata, data modeling, and the importance of aggregate tables in optimizing query performance.

Uploaded by

p bb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Warehousing and Data Mining

Q1) What is a Datawarehouse and why do we need it? BENEFITS.


A data warehouse is a secure, centralized repository for storing and managing large amounts of data from various sources.
A data warehouse is usually used for linking and analyzing heterogeneous sources of business data.

The data warehouse is the center of the data collection and reporting framework developed for the BI system.

NEED
Enhancing the turnaround time for analysis and reporting
Improved Business Intelligence
Benefit of historical data
Standardization of data
Immense ROI (Return On Investment)

BENEFITS
Improved Data Security
Scalability
Access to Historical Insights
Works On-Premises and on Cloud
Q2) 1.3 Data Warehouse and its Need

1. What is it?: A centralized repository for storing data from multiple sources for reporting and analysis.
2. Need in Modern Business: Enables data-driven decision-making and provides a 360-degree view of the business.

Q3) DESIGN APPROACH


BottomUp Approach
Integrated

Time Variant
Q) OLAP vs OLTP
GRANULARITY
Implies levels of details of the data.

Less Granularity More Details Less Summary Fine Granularity


More Granularity Less Details More Summary Gross Granularity

META DATA

In a general sense, metadata is "data about data." It describes the structure, format, and characteristics of the data, enabling effective
management and usage.

In the Context of a Data Warehouse

In a data warehousing environment, metadata takes on a more specialized role. It serves as the roadmap or directory that helps users and
applications interact with the data in the warehouse. It can contain information about:

1. Data Source: Describes where the data comes from, including database names, tables, and columns.
2. Data Transformations: Records any changes made to the data during the ETL (Extract, Transform, Load) process, such as data cleansing,
aggregation, or enrichment.
3. Data Structure: Describes the schema, tables, and fields in the warehouse. This can include field definitions, data types, and
relationships between tables.
4. Business Metadata: Includes definitions, business rules, and lineage to make the data understandable and usable by business users.
5. Operational Metadata: Information about batch loads, query performance statistics, and data usage metrics.
6. Data Lineage: Information about how data flows through the system, useful for troubleshooting and impact analysis.

Importance
1. Data Understanding: Helps users understand what data is available and how to use it.
2. Data Governance: Assists in maintaining data quality, lineage, and security.
3. Query Optimization: Utilized by the system to optimize query performance.
4. Compliance: Important for meeting regulatory requirements related to data management and usage.

In summary, metadata in a data warehouse provides a crucial layer of information that facilitates both the effective use of data by end-users
and the efficient operation of the data warehouse itself.

DATA DICTIONARY VS META DATA

Both data dictionaries and metadata serve the purpose of providing additional information about data, but they are used in different contexts
and for different scopes.

Aspect Data Dictionary Metadata

A centralized repository of
information about data such as Data about data, describing the
meaning, relationships, origin, and structure, type, and characteristics
Definition usage. of the data.
Aspect Data Dictionary Metadata

Primarily focuses on database


objects like tables, columns, keys, Broader in scope, can pertain to
and indexes within a specific any type of data including files,
Scope database. images, and configurations.

Used in various contexts including


Mostly used in the context of databases, data warehouses, file
Context relational databases. systems, and more.

Database administrators, Database administrators,


developers, and sometimes end- developers, data analysts, and
Users users. sometimes automated systems.

Can include data lineage,


Contains names, definitions, and transformations, source systems,
Content attributes of database objects. and operational metadata.

Purpose Aids in database design, Facilitates data management,


governance, and usage across
Aspect Data Dictionary Metadata

maintenance, and documentation. different systems and applications.

Usually accessible through specific Could be embedded within the


database management system data or accessible through
Accessibility tools. separate metadata repositories.

Update Generally static, updated when Can be dynamic, updated as data


Frequency database schema changes. is transformed, moved, or used.

Example:

 Data Dictionary: In a customer database, a data dictionary will specify that the CustomerID column is an integer, serves as a primary
key, and is auto-incremented. It may also specify constraints and relationships with other tables.
 Metadata: In the context of a data warehouse, metadata might indicate that the CustomerID field is sourced from the "Sales" database
and transformed by removing leading zeros during the ETL process.

In summary, while both serve to describe data, a data dictionary is more specific to the structure of a database, whereas metadata is a broader
term that covers additional aspects of data including its lineage, transformations, and usage across different systems and applications.
CHAPTER 2

1)

2) No Data Preprocessing before loading

1) There is data preprocessing


BUT
Has both.
A data mart is a subject-oriented relational database that stores transactional data in rows and columns, which makes it easy to access,
organize, and understand.
UNIT 3 DIMENSIONAL MODELLING

Q) WHAT IS A DATA MODEL?

A data model is a representation of how data is stored in a database and it is usually a diagram of the few tables and the relationships that exist
between them.

Q) WHAT IS DIMENSIONAL MODELLING?

Dimensional modeling is a data model design adopted when building a data warehouse. Simply, it can be understood that dimension modeling
reduces the response time of query fired unlike relational systems.
STAR SCHEMA
SNOWFLAKE SCHEMA

Snowflake schema is the extension of star schema which adds more dimensions to give more meaning to the logical view of the database.
These additional tables are more normalized than star schema.

The snowflake model is the conclusion of decomposing one or more of the dimensions. Snowflake Schema in data warehouse is a logical
arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape.

3.6.1 Features of Snowflake Schema


Following are the important features of snowflake schema:
1. It has normalized tables
2. Occupy less disk space.
3. It requires more lookup time as many tables are interconnected and extending dimensions.
AGGREGATE TABLES
Aggregate fact tables roll up the basic fact tables of the schema to improve the query processing. The BI tools smoothly select the level of
aggregation to improve the query performance. Aggregate fact tables contain foreign keys referring to dimension tables.

Points to note about Aggregate tables:


1) It is also called summary tables.
2) It contains pre-computed queries of the data warehouse schema.
3) It reduces the dimensionality of the base fact tables.
4) It can be used to respond to the queries of the dimensions that are saved.

NEED FOR BUILDING AGGREGATE FACT TABLES

1) Reduction in query processing time


2) Readymade composite queries of DW-Schema so the connection is faster.

You might also like