0% found this document useful (0 votes)
13 views8 pages

Elaborated DWH DataMining Assignment Answers

A Data Warehouse (DWH) is a centralized system for reporting and data analysis, integrating data from various sources and storing both current and historical data. It supports business intelligence by providing reliable data for analysis, forecasting, and decision-making across various sectors such as retail, healthcare, and finance. The document also discusses the evolution, architecture, and applications of DWH and data mining processes, highlighting their importance in extracting valuable insights from large datasets.

Uploaded by

ayushram361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Elaborated DWH DataMining Assignment Answers

A Data Warehouse (DWH) is a centralized system for reporting and data analysis, integrating data from various sources and storing both current and historical data. It supports business intelligence by providing reliable data for analysis, forecasting, and decision-making across various sectors such as retail, healthcare, and finance. The document also discusses the evolution, architecture, and applications of DWH and data mining processes, highlighting their importance in extracting valuable insights from large datasets.

Uploaded by

ayushram361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Elaborated Answers: Data Warehouse and Data Mining Assignment

1. A Data Warehouse (DWH) is a system used for reporting and data analysis, serving as a central

repository of integrated data from one or more disparate sources. Data warehouses store current

and historical data and are used for creating analytical reports. Key features include:

- Subject-Oriented: Organized around major subjects such as customer, product, sales.

- Integrated: Data from multiple sources is standardized.

- Time-Variant: Data includes historical information to track changes over time.

- Non-Volatile: Once data is entered, it is not changed or deleted.

2. Applications of DWH span various business sectors. In retail, it helps analyze buying trends; in

healthcare, it tracks patient records and treatment outcomes. In banking, it supports fraud detection

and risk management. Other applications include:

- Market research and competitive analysis

- Financial forecasting and budgeting

- Customer profiling and churn prediction

- Supply chain and inventory management

- Strategic business reporting and dashboards

3. Types of Data Warehouses:

- Enterprise Data Warehouse (EDW): A centralized warehouse providing a holistic view across the

enterprise.

- Operational Data Store (ODS): A database designed to integrate data from multiple sources for

additional operations.

- Data Mart: A smaller, more focused version of a data warehouse, tailored for specific departments

such as sales or marketing.


4. The DWH process involves several phases:

- Data Extraction: Collecting data from various sources like transactional systems, flat files, APIs.

- Data Transformation: Standardizing data format, cleaning inconsistencies, aggregating values.

- Data Loading: Populating the data warehouse with transformed data.

- Data Storage: Using schemas such as star or snowflake models.

- Data Access: End-users query data using tools like SQL, dashboards, and BI software.

5. Evolution of DWH:

- Initially, businesses used file systems and spreadsheets.

- Then came databases for transactional data.

- DWH emerged to solve data integration and reporting challenges.

- Modern warehouses utilize cloud platforms like Snowflake, AWS Redshift, and support real-time

analytics and machine learning.

6. Needs of DWH:

- Centralized data for consistency.

- Historical data for trend analysis.

- Faster query performance for decision-making.

- Reduced data redundancy.

- Better data governance and compliance tracking.

7. Benefits of DWH:

- Improved Business Intelligence (BI) due to reliable data.

- Enhanced productivity for analysts and decision-makers.

- Scalable architecture to support growing data.

- Better forecasting and planning.

- Data consistency across departments.


8. Business Intelligence (BI) involves collecting, processing, and analyzing business data to make

informed decisions. DWH plays a critical role by acting as the foundation for BI tools. It stores

cleansed, consolidated, and historical data which BI tools use to generate insights via reports,

dashboards, and visualizations.

9. DBMS vs DWH:

- DBMS handles day-to-day operations with real-time data; DWH is for long-term storage and

analysis.

- DBMS supports CRUD operations (Create, Read, Update, Delete), while DWH supports complex

queries and analytics.

- DBMS uses ER modeling; DWH uses dimensional modeling.

10. DWH vs DM:

- Data Warehouse is enterprise-wide; Data Mart is subject-specific.

- DWH is expensive and complex; DM is faster and easier to implement.

- DM may be dependent (draws from DWH) or independent (draws directly from source systems).

11. Characteristics of DWH:

- Designed for query and analysis rather than transaction processing.

- Data is read-only and not updated.

- Contains both current and historical data.

- Organized to facilitate reporting and analysis.

- Optimized for speed and accuracy in querying.

12. Data Marts:

- Focused subset of a data warehouse.

- Created for specific users or departments like HR, marketing, or finance.


- Features: fast query response, lower cost, customized data models.

- Benefits: Improved performance, simplicity in data access, quicker implementation time.

13. Structure of Data Mart:

- Data Source Layer: Pulls data from ERP, CRM, flat files.

- ETL Layer: Cleans, transforms, and loads data.

- Staging Area: Temporary storage during ETL.

- Data Mart Storage: Star or snowflake schema.

- Presentation Layer: Dashboards, reports, visualization tools.

14. Types of Data Marts:

- Dependent Data Mart: Derived from central DWH.

- Independent Data Mart: Created without central warehouse.

- Hybrid Data Mart: Combines elements of both dependent and independent.

15. Hub and Spoke Architecture:

- Central warehouse (hub) with multiple departmental data marts (spokes).

- Benefits: Integrated architecture, easy maintenance, centralized data governance.

- Features: Scalable design, simplifies data distribution, improves performance.

16. Data Staging:

- An intermediate storage area used during ETL.

- Functions: Data cleansing, transformation, and integration.

- Features: Temporary buffer for data before loading, ensures data quality and consistency.

- Diagram: Source -> ETL -> Staging -> Warehouse

17. Metadata:
- Information about the structure, operations, and usage of data.

- Types:

- Technical Metadata: table names, field types, data lineage.

- Business Metadata: business rules, data definitions.

- Significance: Enables understanding of data, improves data governance, aids in troubleshooting.

18. Building Blocks:

- Source Systems: ERP, CRM, web logs.

- ETL Tools: Informatica, Talend.

- Staging Area: Pre-processing zone.

- Data Warehouse Database: Oracle, Teradata.

- Metadata Repository: Stores information about data.

- BI Tools: Tableau, Power BI.

- Data Marts: For departmental use.

19. DWH Architecture:

- Bottom Tier: Source data, ETL tools.

- Middle Tier: DWH database and OLAP servers.

- Top Tier: Front-end tools for querying, analysis, and reporting.

- Diagram: Shows flow from data sources to users.

20. Architecture Types:

- 1-Tier: Simple, no separation, less secure.

- 2-Tier: DWH and analysis layer; better performance.

- 3-Tier: Most common; includes source, DWH, and client interface layers.

- Each tier separates concerns and improves scalability and maintainability.


21. Data Mining:

- A process of discovering hidden patterns from large datasets.

- Purposes include:

- Prediction: Future values.

- Classification: Grouping data.

- Clustering: Identifying similar groups.

- Association: Market basket analysis.

22. Data Mining Process:

1. Data Cleaning: Remove noise and inconsistent data.

2. Data Integration: Combine data from multiple sources.

3. Data Selection: Select relevant data.

4. Data Transformation: Normalize and summarize.

5. Data Mining: Apply algorithms to extract patterns.

6. Pattern Evaluation: Identify truly interesting patterns.

7. Knowledge Presentation: Visualize and interpret.

23. KDD (Knowledge Discovery in Databases):

- Comprehensive process of identifying valid patterns.

Stages:

1. Data Selection

2. Preprocessing

3. Data Transformation

4. Data Mining

5. Interpretation & Evaluation

- KDD results in actionable knowledge, not just data.


24. OLAP (Online Analytical Processing):

- Enables users to analyze multidimensional data.

- Components:

- Dimensions: Category of data (e.g., time, product).

- Measures: Quantitative data (e.g., sales).

- Diagram includes OLAP cube with slices, dice, drill down, and roll-up operations.

25. OLAP Diagram:

- Shows a cube structure with dimensions (e.g., region, product, time) and measures (sales).

- Used to visualize how data is aggregated and queried across multiple dimensions.

26. Types of OLAP:

- MOLAP: Uses multidimensional cube, fast query performance.

- ROLAP: Works on relational database, scalable.

- HOLAP: Combines benefits of MOLAP and ROLAP.

- Features: Real-time analytics, pre-aggregated data, hierarchical analysis.

27. OLAP vs OLTP:

- OLAP: Analytical, read-only, supports complex queries, data is historical.

- OLTP: Transactional, supports insert/update/delete, data is current.

- OLAP supports decision support; OLTP supports day-to-day operations.

28. DWH Classifiers:

- KNN (K-Nearest Neighbor): Classifies data based on proximity to training samples.

- SVM (Support Vector Machine): Finds optimal hyperplane to separate data into categories.

- Used in predictive analytics, fraud detection, and recommendation systems.


29. Data Mining Tools/Techniques:

- Tools: RapidMiner, Weka, Orange, R, Python.

- Techniques:

- Classification: Decision Trees, Naïve Bayes

- Clustering: K-Means

- Regression: Linear, Logistic

- Association Rule Mining: Apriori Algorithm

- Neural Networks, Deep Learning

30. Applications of Data Mining:

- Retail: Market basket analysis, customer segmentation

- Finance: Credit scoring, fraud detection

- Healthcare: Disease prediction, patient monitoring

- Telecommunications: Churn prediction

- E-commerce: Recommendation systems

- Manufacturing: Quality control, predictive maintenance

You might also like