Building Data Warehouse From Scratch

The document outlines the considerations for building a data warehouse from scratch, including the choice between a data warehouse and alternative solutions like data lakes, data marts, and data hubs based on organizational needs. Key factors to consider include the type of data, processing speed, scalability, and budget. It concludes that a hybrid architecture is often optimal, combining various solutions to meet diverse data requirements and business goals.

Uploaded by

tarun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views6 pages

Building Data Warehouse From Scratch

Uploaded by

tarun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Building data warehouse from scratch

Choose between Datawarehouse or something else?

o Data Warehouse: Ideal for organizations needing to integrate, store, and
analyze large volumes of structured data over time for business
intelligence, reporting, and historical analysis.
o Alternative Solutions: If you need real-time data processing, simplified
or cost-effective analytics, or don’t have significant data volume,
alternatives like data lakes, data marts, cloud-based solutions, or
self-service BI tools might be better suited.

Key questions to answer

1. What is the primary purpose for your data? Is it mainly for
operational decision-making or historical analysis?
2. How fast does your data need to be processed and analyzed? Do
you need real-time analytics, or can it be batch-processed?
3. What types of data do you have? Is it structured (e.g., transactional
data), unstructured (e.g., logs, social media), or semi-structured (e.g.,
JSON)?
4. How scalable is your current infrastructure? Are you expecting
significant data growth, and can your current systems handle that growth
efficiently?
5. What is your budget for infrastructure and implementation? A full
data warehouse can require significant investment in both time and
money.

Consolidating data for an enterprise involves choosing the right data architecture
that allows you to integrate, store, and manage data from multiple sources to
provide actionable insights and drive business value. The key options for
consolidating enterprise data include data warehouses, data lakes, data
marts, and data hubs, each of which has specific strengths and ideal use
cases. Choosing between them depends on factors like the type of data you
have, the speed and flexibility of analysis required, the scalability, and the
overall business goals.
Here’s a breakdown of the main options for data consolidation and how to
choose between them:
1. Data Warehouse (DW)
What It Is:
A data warehouse is a centralized repository for structured, historical data that
has been integrated and transformed for analytical purposes. It uses an ETL
(Extract, Transform, Load) process to clean, standardize, and structure data
before it is stored.
Strengths:
 Structured Data: Optimized for structured data from transactional
systems (e.g., sales, finance, CRM).
 Business Intelligence (BI): Ideal for BI tools and complex querying,
reporting, and dashboards.
 Data Integration: Aggregates data from various sources and transforms
it to provide a unified, consistent view.
 Performance: High-performance querying on large datasets, especially
for complex reporting and historical analysis.
 Data Consistency: Ensures data quality, consistency, and governance
through transformation and cleansing.
Best for:
 Historical analysis and reporting on structured data.
 Organizations that require reliable, consistent datasets for decision-
making.
 Businesses that prioritize data governance and quality.
 Long-term business intelligence and performance monitoring.
When Not to Choose:
 If you have large volumes of unstructured or semi-structured data (e.g.,
social media, IoT, images, documents).
 If you require real-time analytics or data from high-velocity sources.
2. Data Lake
What It Is:
A data lake is a large-scale, centralized repository that can store raw,
unstructured, semi-structured, and structured data at any scale. It typically uses
a schema-on-read approach, meaning data is stored in its raw form and
structured only when read or queried.
Strengths:
 Flexibility: Can handle all types of data — structured (databases), semi-
structured (logs, JSON), and unstructured (images, audio, video).
 Scalability: Cost-effective and scalable storage for big data, often in
cloud environments like AWS S3, Azure Data Lake, or Google Cloud
Storage.
 Data Agility: Data can be ingested quickly and in real-time, allowing for
the storage of a variety of data formats.
 Advanced Analytics: Supports machine learning, data mining, and big
data analytics with tools like Apache Hadoop, Spark, and TensorFlow.
Best for:
 Storing large volumes of diverse data (including IoT data, social media
data, logs, sensor data).
 Organizations looking to perform advanced analytics, including machine
learning and predictive analytics.
 Businesses with a variety of data sources that don’t require immediate
structure.
 Data exploration and discovery in a raw format (for example, data
scientists working with unprocessed data).
When Not to Choose:
 If your primary need is structured, high-performance querying for
traditional BI purposes.
 If you don’t have the infrastructure or tools to process and manage large
unstructured data.
3. Data Mart
What It Is:
A data mart is a subset of a data warehouse, often focused on a specific
department or business unit (e.g., sales, finance, marketing). Data marts contain
data that is relevant to specific analytical needs but typically do not have the
breadth of data available in the full data warehouse.
Strengths:
 Department-Specific Focus: Focuses on specific business functions or
departments, allowing for faster, more relevant insights.
 Simpler Setup: Easier and quicker to implement than a full data
warehouse, and can be a good option for smaller-scale data integration.
 Cost-Effective: Because it’s smaller and more focused, it can be less
expensive and resource-intensive to maintain than a full data warehouse.
Best for:
 Smaller teams or departments that need specialized data for analysis.
 Rapid insights for specific business functions (e.g., sales performance,
marketing campaigns).
 Organizations that don't need an enterprise-wide data warehouse and
prefer a more modular approach.
When Not to Choose:
 If you need enterprise-wide data consolidation.
 If there’s a need for large-scale, cross-functional analysis (e.g., combined
financial, sales, and customer data).
4. Data Hub
What It Is:
A data hub is a centralized data architecture that integrates multiple data
sources (both structured and unstructured) but does not necessarily store data
itself in one central location. Instead, it acts as an intermediary, providing a
unified access layer to distributed data across multiple systems.
Strengths:
 Data Integration: It serves as a “hub” for accessing data across different
systems without moving data into a central repository (i.e., data
federation).
 Real-Time Access: Provides real-time access to multiple systems or data
sources for operational use cases.
 Decoupling Data Sources: It integrates data without requiring full-scale
ETL processes or data replication.
 Flexibility: Supports both operational data and analytic data, allowing
organizations to interact with data in real-time.
Best for:
 Real-time, operational data access and integration.
 Organizations that need to aggregate data from multiple disparate sources
without centralizing it.
 Hybrid environments with multiple systems (e.g., cloud, on-premise, third-
party) where you want to avoid moving large amounts of data.
When Not to Choose:
 If you require deep historical analysis or complex querying of large
volumes of data.
 If you're working with vast amounts of data that need to be processed and
stored in a centralized location.
5. Choosing the Right Solution: Key Considerations
When deciding between these options, ask the following questions:
A. What Types of Data Do You Have?
 Structured Data: If most of your data is structured (e.g., relational
databases, transactional data), a data warehouse may be ideal.
 Unstructured or Semi-Structured Data: If you have large amounts of
semi-structured (logs, social media, JSON) or unstructured data (images,
audio), a data lake is likely a better fit.
B. What is Your Primary Use Case?
 Business Intelligence & Reporting: If your primary goal is reporting, BI,
and historical analysis of structured data, a data warehouse is the best
option.
 Advanced Analytics: If you need to perform machine learning, predictive
analytics, and data science on large volumes of varied data, a data lake
is better.
 Real-Time Operational Data: If real-time integration and access to
operational data across systems are required, a data hub could be the
best solution.
C. How Fast Does Data Need to Be Processed?
 Batch Processing (ETL): If your data is processed in batches (e.g., daily
or weekly reports), a data warehouse is more suitable.
 Real-Time Processing: If you need real-time data ingestion and
processing, a data lake with streaming analytics capabilities or a data
hub may be a better choice.
D. How Large is Your Data?
 Big Data: If you’re dealing with massive volumes of data, especially
unstructured data, a data lake can handle large-scale storage without
needing to structure it upfront.
 Medium to Small Data: For more manageable datasets or if only a
specific department requires insights, a data mart may be sufficient.
E. Budget and Resources
 Cost-Effective Scaling: Data lakes (especially in the cloud) offer cost-
effective scalability for large volumes of data.
 Complexity & Maintenance: Data warehouses often require more
upfront investment in ETL processes, and ongoing maintenance can be
more complex than a simpler data mart or data hub solution.

6. Conclusion
 Data Warehouse: Best for organizations with structured data that require
consistent, high-performance analytics, reporting, and historical insights
across departments.
 Data Lake: Ideal for businesses dealing with large volumes of diverse,
unstructured data and seeking flexible, scalable storage with advanced
analytics capabilities.
 Data Mart: Suitable for smaller-scale, department-specific analytics with
a focus on speed and cost-effectiveness.
 Data Hub: Best for real-time data integration across distributed sources
without needing to store all data centrally.
In practice, many organizations opt for a hybrid architecture, using data
lakes for raw, unstructured data and data warehouses for structured,
analytical data. A data hub might also be integrated to unify access to multiple
systems.
Ultimately, the best solution will depend on your data types, business objectives,
use cases, scalability needs, and budget.

DWM Gufran Notes
No ratings yet
DWM Gufran Notes
318 pages
Data Lakehouse, Data Mesh, and Data Fabric - SqlBits
No ratings yet
Data Lakehouse, Data Mesh, and Data Fabric - SqlBits
35 pages
Data Warehousing and Online Analytical Processing
No ratings yet
Data Warehousing and Online Analytical Processing
31 pages
DWDM
No ratings yet
DWDM
61 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
DMBI Summer 23
No ratings yet
DMBI Summer 23
33 pages
Azure Data Engineering Complete Guide
No ratings yet
Azure Data Engineering Complete Guide
130 pages
Types of Data Warehouses
No ratings yet
Types of Data Warehouses
2 pages
6th - SEM Data Science Notes
No ratings yet
6th - SEM Data Science Notes
46 pages
BI Data House: What Is The Role of Data Warehousing in Business Intelligence?
No ratings yet
BI Data House: What Is The Role of Data Warehousing in Business Intelligence?
14 pages
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
119 pages
Bida Notes
No ratings yet
Bida Notes
67 pages
Data Warehousing Unit 1
No ratings yet
Data Warehousing Unit 1
18 pages
Unit I
No ratings yet
Unit I
36 pages
Data Warehousing
No ratings yet
Data Warehousing
10 pages
Day 06
No ratings yet
Day 06
34 pages
ISDM Group5 Review
No ratings yet
ISDM Group5 Review
23 pages
Database Warehouse Data Mining
No ratings yet
Database Warehouse Data Mining
29 pages
Business Analytics
No ratings yet
Business Analytics
3 pages
DM Unit 2
No ratings yet
DM Unit 2
21 pages
DWH
No ratings yet
DWH
7 pages
Business Intelligence Overview
No ratings yet
Business Intelligence Overview
20 pages
Eb Data Lake Vs Data Warehouse Selection Guide en
No ratings yet
Eb Data Lake Vs Data Warehouse Selection Guide en
20 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
13 pages
Data Warehousing and Mining Module 1
No ratings yet
Data Warehousing and Mining Module 1
34 pages
Data Warehousing
No ratings yet
Data Warehousing
8 pages
DataWarehouse
No ratings yet
DataWarehouse
5 pages
Types of Data Warehouses With Examples
No ratings yet
Types of Data Warehouses With Examples
3 pages
Assignment No 01 Subject Data Ware House Topic Comparison of DWH Tools Group Members Muhammad Haseeb Khan Hashim Shoukat Mir Abdul Wahab Submitted To Proffessor Anwar Ali
No ratings yet
Assignment No 01 Subject Data Ware House Topic Comparison of DWH Tools Group Members Muhammad Haseeb Khan Hashim Shoukat Mir Abdul Wahab Submitted To Proffessor Anwar Ali
6 pages
DL Vs DLH Draft v0.1
No ratings yet
DL Vs DLH Draft v0.1
9 pages
Data Warehousing
No ratings yet
Data Warehousing
33 pages
Aniket DWDM Assignment
No ratings yet
Aniket DWDM Assignment
12 pages
Chapter 1 Data Warehouse Fundamentals
No ratings yet
Chapter 1 Data Warehouse Fundamentals
26 pages
Top Five Differences Between Data Lakes and Data Warehouses
No ratings yet
Top Five Differences Between Data Lakes and Data Warehouses
6 pages
DW Unit I Notes
No ratings yet
DW Unit I Notes
28 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Data Warehousing
No ratings yet
Data Warehousing
8 pages
Database Datalake
No ratings yet
Database Datalake
2 pages
DWM QB Soln
No ratings yet
DWM QB Soln
18 pages
Electricity
No ratings yet
Electricity
10 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
Big Data
No ratings yet
Big Data
4 pages
Data Warehousing and Data Mining Sample 2 PRESENTATION
No ratings yet
Data Warehousing and Data Mining Sample 2 PRESENTATION
21 pages
Data Repositories in Data Analytics
No ratings yet
Data Repositories in Data Analytics
8 pages
Data Architecture
No ratings yet
Data Architecture
1 page
Warehouse Assignment MIM 106
No ratings yet
Warehouse Assignment MIM 106
8 pages
DW Micro
No ratings yet
DW Micro
2 pages
Batch Management
No ratings yet
Batch Management
17 pages
What Is A Data Warehouse - IBM
No ratings yet
What Is A Data Warehouse - IBM
9 pages
Data Modeling Concept Latest
No ratings yet
Data Modeling Concept Latest
25 pages
Warehousing & Data Mining Assignment
No ratings yet
Warehousing & Data Mining Assignment
13 pages
Data Mining
No ratings yet
Data Mining
3 pages
Big Query
No ratings yet
Big Query
8 pages
Data Warehousing
No ratings yet
Data Warehousing
4 pages
Demo Ex280
100% (1)
Demo Ex280
7 pages
WA Data Warehouse
No ratings yet
WA Data Warehouse
16 pages
Data Warehouse
No ratings yet
Data Warehouse
3 pages
OS Module
No ratings yet
OS Module
163 pages
DWDM
No ratings yet
DWDM
15 pages
Selecting The Right Data Warehouse For Analytics
No ratings yet
Selecting The Right Data Warehouse For Analytics
13 pages
Sabita 5+ Yrs Testing Resume
100% (2)
Sabita 5+ Yrs Testing Resume
5 pages
Smart Petcare System
100% (1)
Smart Petcare System
28 pages
Deploy Run Maximo On Openshift - 0
No ratings yet
Deploy Run Maximo On Openshift - 0
12 pages
DW Vs Data Lake
No ratings yet
DW Vs Data Lake
5 pages
Trend Next Dumps
No ratings yet
Trend Next Dumps
129 pages
Mapping The Data Warehouse Architecture To Multiprocessor Architecture
No ratings yet
Mapping The Data Warehouse Architecture To Multiprocessor Architecture
15 pages
MTech Cyber I and II Sem Syllabus
No ratings yet
MTech Cyber I and II Sem Syllabus
24 pages
Suresh Nagavali Oracle EBS Technical Consultant
No ratings yet
Suresh Nagavali Oracle EBS Technical Consultant
5 pages
20bce0610 VL2022230103815 Pe003
No ratings yet
20bce0610 VL2022230103815 Pe003
32 pages
Record Keeping PDF
100% (1)
Record Keeping PDF
16 pages
DLF
No ratings yet
DLF
154 pages
A Complete Guide To Software
No ratings yet
A Complete Guide To Software
21 pages
Is2109 2024
No ratings yet
Is2109 2024
11 pages
Informatica Power Center Best Practices
No ratings yet
Informatica Power Center Best Practices
8 pages
Csa 7
No ratings yet
Csa 7
27 pages
CC Answers
No ratings yet
CC Answers
92 pages
NoSQL Unit 1 & 2 QnA
No ratings yet
NoSQL Unit 1 & 2 QnA
18 pages
What Is The Difference Between Transparent Table and Pooled Table?
No ratings yet
What Is The Difference Between Transparent Table and Pooled Table?
20 pages
2.process and Threds
No ratings yet
2.process and Threds
48 pages
Tarpapel Bahagi NG Kompyuter Lecture
No ratings yet
Tarpapel Bahagi NG Kompyuter Lecture
45 pages
Ways in Which Technology Has Influenced Banking and Commerce
No ratings yet
Ways in Which Technology Has Influenced Banking and Commerce
2 pages
Passwordless Authentication
No ratings yet
Passwordless Authentication
18 pages
GU - SAP ECC - Configuration Steps in IDOC
No ratings yet
GU - SAP ECC - Configuration Steps in IDOC
19 pages
Management Information Systems: About Starbucks
No ratings yet
Management Information Systems: About Starbucks
6 pages
Project Proposal - CIT 490
No ratings yet
Project Proposal - CIT 490
2 pages
Cloud Roadmap
No ratings yet
Cloud Roadmap
9 pages
Chapter 1 Basics of DBMS Intranet
No ratings yet
Chapter 1 Basics of DBMS Intranet
6 pages
Software Engineering - ESC501: - Prof. Poulami Dutta
No ratings yet
Software Engineering - ESC501: - Prof. Poulami Dutta
7 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Building Data Warehouse From Scratch

Uploaded by

Building Data Warehouse From Scratch

Uploaded by

Building data warehouse from scratch

Choose between Datawarehouse or something else?

Key questions to answer

You might also like