Elaborated DWH DataMining Assignment Answers
Elaborated DWH DataMining Assignment Answers
1. A Data Warehouse (DWH) is a system used for reporting and data analysis, serving as a central
repository of integrated data from one or more disparate sources. Data warehouses store current
and historical data and are used for creating analytical reports. Key features include:
2. Applications of DWH span various business sectors. In retail, it helps analyze buying trends; in
healthcare, it tracks patient records and treatment outcomes. In banking, it supports fraud detection
- Enterprise Data Warehouse (EDW): A centralized warehouse providing a holistic view across the
enterprise.
- Operational Data Store (ODS): A database designed to integrate data from multiple sources for
additional operations.
- Data Mart: A smaller, more focused version of a data warehouse, tailored for specific departments
- Data Extraction: Collecting data from various sources like transactional systems, flat files, APIs.
- Data Access: End-users query data using tools like SQL, dashboards, and BI software.
5. Evolution of DWH:
- Modern warehouses utilize cloud platforms like Snowflake, AWS Redshift, and support real-time
6. Needs of DWH:
7. Benefits of DWH:
informed decisions. DWH plays a critical role by acting as the foundation for BI tools. It stores
cleansed, consolidated, and historical data which BI tools use to generate insights via reports,
9. DBMS vs DWH:
- DBMS handles day-to-day operations with real-time data; DWH is for long-term storage and
analysis.
- DBMS supports CRUD operations (Create, Read, Update, Delete), while DWH supports complex
- DM may be dependent (draws from DWH) or independent (draws directly from source systems).
- Data Source Layer: Pulls data from ERP, CRM, flat files.
- Features: Temporary buffer for data before loading, ensures data quality and consistency.
17. Metadata:
- Information about the structure, operations, and usage of data.
- Types:
- 3-Tier: Most common; includes source, DWH, and client interface layers.
- Purposes include:
Stages:
1. Data Selection
2. Preprocessing
3. Data Transformation
4. Data Mining
- Components:
- Diagram includes OLAP cube with slices, dice, drill down, and roll-up operations.
- Shows a cube structure with dimensions (e.g., region, product, time) and measures (sales).
- Used to visualize how data is aggregated and queried across multiple dimensions.
- SVM (Support Vector Machine): Finds optimal hyperplane to separate data into categories.
- Techniques:
- Clustering: K-Means