VV_Data Warehousing and Data Mining
VV_Data Warehousing and Data Mining
The data warehouse is the center of the data collection and reporting framework developed for the BI system.
NEED
Enhancing the turnaround time for analysis and reporting
Improved Business Intelligence
Benefit of historical data
Standardization of data
Immense ROI (Return On Investment)
BENEFITS
Improved Data Security
Scalability
Access to Historical Insights
Works On-Premises and on Cloud
Q2) 1.3 Data Warehouse and its Need
1. What is it?: A centralized repository for storing data from multiple sources for reporting and analysis.
2. Need in Modern Business: Enables data-driven decision-making and provides a 360-degree view of the business.
Time Variant
Q) OLAP vs OLTP
GRANULARITY
Implies levels of details of the data.
META DATA
In a general sense, metadata is "data about data." It describes the structure, format, and characteristics of the data, enabling effective
management and usage.
In a data warehousing environment, metadata takes on a more specialized role. It serves as the roadmap or directory that helps users and
applications interact with the data in the warehouse. It can contain information about:
1. Data Source: Describes where the data comes from, including database names, tables, and columns.
2. Data Transformations: Records any changes made to the data during the ETL (Extract, Transform, Load) process, such as data cleansing,
aggregation, or enrichment.
3. Data Structure: Describes the schema, tables, and fields in the warehouse. This can include field definitions, data types, and
relationships between tables.
4. Business Metadata: Includes definitions, business rules, and lineage to make the data understandable and usable by business users.
5. Operational Metadata: Information about batch loads, query performance statistics, and data usage metrics.
6. Data Lineage: Information about how data flows through the system, useful for troubleshooting and impact analysis.
Importance
1. Data Understanding: Helps users understand what data is available and how to use it.
2. Data Governance: Assists in maintaining data quality, lineage, and security.
3. Query Optimization: Utilized by the system to optimize query performance.
4. Compliance: Important for meeting regulatory requirements related to data management and usage.
In summary, metadata in a data warehouse provides a crucial layer of information that facilitates both the effective use of data by end-users
and the efficient operation of the data warehouse itself.
Both data dictionaries and metadata serve the purpose of providing additional information about data, but they are used in different contexts
and for different scopes.
A centralized repository of
information about data such as Data about data, describing the
meaning, relationships, origin, and structure, type, and characteristics
Definition usage. of the data.
Aspect Data Dictionary Metadata
Example:
Data Dictionary: In a customer database, a data dictionary will specify that the CustomerID column is an integer, serves as a primary
key, and is auto-incremented. It may also specify constraints and relationships with other tables.
Metadata: In the context of a data warehouse, metadata might indicate that the CustomerID field is sourced from the "Sales" database
and transformed by removing leading zeros during the ETL process.
In summary, while both serve to describe data, a data dictionary is more specific to the structure of a database, whereas metadata is a broader
term that covers additional aspects of data including its lineage, transformations, and usage across different systems and applications.
CHAPTER 2
1)
A data model is a representation of how data is stored in a database and it is usually a diagram of the few tables and the relationships that exist
between them.
Dimensional modeling is a data model design adopted when building a data warehouse. Simply, it can be understood that dimension modeling
reduces the response time of query fired unlike relational systems.
STAR SCHEMA
SNOWFLAKE SCHEMA
Snowflake schema is the extension of star schema which adds more dimensions to give more meaning to the logical view of the database.
These additional tables are more normalized than star schema.
The snowflake model is the conclusion of decomposing one or more of the dimensions. Snowflake Schema in data warehouse is a logical
arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape.