Module 6

Uploaded by

Bhavana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views16 pages

Module 6

Uploaded by

Bhavana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Module 6

Emerging trends
Emergence of cloud services and
infrastructure in Data Warehousing
 A cloud data warehouse is a modern way of storing and managing large amounts of
data in a public cloud. It lets you quickly access and use your data.
 This makes it the perfect solution for businesses that rely on data and require agility,
flexibility, and ease of use for their infrastructure requirements.
 Key features:
1. Separation of storage and compute
2. Data integration and management
3. Data storage
4. Database performance
5. Security and compliance
Benefits
 Increased flexibility and scalability
 Reduced cost
 Enhanced security
 Increased performance
 Increased collaboration
 Parallel processing reduces the time required to manage data
 Dynamic allocation of computing resources reduce cost and improve performance
 Cost of administration is limited with cloud service providers managing backend systems
 Cloud acts as a failsafe system as disaster recovery is assured
 Dynamic pricing plans make it affordable even for small team operations

 With the continuous innovation by the data warehouse providers, the distinction
between various platforms is widening. This has provoked businesses to compare cloud-
based data warehouses to identify the best solution. Data Warehouse as a Service is
expected to reach USD 7.69 Billion by 2028.
Evolution of cloud data warehouse
Traditional warehouse Vs Cloud
warehouse
Traditional warehouse Cloud warehouse
Data Storage Can handle only a limited amount of data Can handle virtually limitless data with parallel
based on the availability of systems and processing and infinite scalability
resources at a time
Semi-structured and unstructured data is Tuned to handle unstructured data which is
difficult to handle with on-prem warehouses automatically transformed for usability with
‘schema-on-write’
Interoperability The interoperability of different technologies A virtual interoperable layer sits on the data source
and orchestration of separate systems is to allow easy integration of data from different
challenging systems
Scaling Scaling up is tedious and time-consuming as Instant scaling is possible on demand, both
challenges both hardware and software must be vertically and horizontally
reconfigured
Scaling up requires huge investments in On-demand scaling allows companies to make
hardware and human resources incremental investments that are affordable
Infrastructure of cloud
datawarehouse
 On the other hand, a cloud warehouse is a database stored as a managed
service in a public cloud for scalable business intelligence.
 Cloud warehouses were built to address the needs of modern organizations.
A major difference is due to the separation of compute and storage in
the cloud that makes the warehouse dynamic.
 Moreover, with storage, traditional warehouses followed a star schema
which was costly, especially with high volumes and wider varieties.
 And unlike warehouses, cloud architecture includes a shared space to access
in parallel and thus, delivered improvements in the scale and
performance.
 Sharing of resources between different users means enterprises only must
pay for their utilization rather than the whole infrastructure.
Business case 1:
A. Introduction:
 In today's highly competitive business landscape, organisations face the challenge of
effectively managing and leveraging their ever-growing data assets.
 One of the major hurdles they encounter is the presence of data silos—isolated
repositories of information scattered across different systems and departments.
 These silos impede collaboration, hinder data accessibility, and limit the ability to gain
comprehensive insights.
 To address this issue, the construction of a new data warehouse emerges as a
strategic solution that can centralise and integrate data, leading to improved decision-
making, enhanced business insights, and increased operational efficiency.
B. Problem Statement:
 The presence of data silos within our organisation creates numerous
complications and inefficiencies. Data resides in disparate systems, making it
difficult to access and analyse holistically.
 This fragmentation hampers collaboration and leads to inconsistent and
unreliable reporting. Valuable time and resources are wasted on manual data
integration, resulting in delays in decision-making and missed opportunities.
 Furthermore, compliance and governance requirements become challenging to
fulfill due to the lack of standardised data management practices across the
organisation. To overcome these obstacles and unlock the full potential of our
data, we propose the construction of a new data warehouse.
C. Objectives:
 The primary objectives of building a new data warehouse are as follows:
1. Centralising Data: Establish a single, unified repository to consolidate data
from diverse sources and eliminate data silos.
2. Improving Data Accessibility: Provide a user-friendly interface that enables
stakeholders across the organisation to easily access and retrieve relevant
data.
3. Enhancing Data Quality and Consistency: Implement robust data
governance practices, data cleansing, and validation mechanisms to ensure
consistent and accurate information.
4. Enabling Advanced Analytics and Reporting: Create a foundation for
advanced analytics, data modeling, and real-time reporting capabilities.
5. Facilitating Data Governance and Compliance: Establish data governance
policies and procedures to ensure compliance with regulatory requirements
and data privacy standards.
Data lakes
 A data lake is a centralized repository that ingests and stores large volumes
of data in its original form.
 The data can then be processed and used as a basis for a variety of analytic needs.
 Due to its open, scalable architecture, a data lake can accommodate all
types of data from any source, from structured (database tables, Excel
sheets) to semi-structured (XML files, webpages) to unstructured (images,
audio files, tweets), all without sacrificing fidelity.
 The data files are typically stored in staged zones—raw, cleansed, and curated—so
that different types of users may use the data in its various forms to meet their needs.
 Data lakes provide core data consistency across a variety of applications, powering
big data analytics, machine learning, predictive analytics, and other forms of
intelligent action.
Use cases
 Streaming media. Subscription-based streaming companies collect and process insights
on customer behavior, which they may use to improve their recommendation algorithm.
 Finance. Investment firms use the most up-to-date market data, which is collected and
stored in real time, to efficiently manage portfolio risks.
 Healthcare. Healthcare organizations rely on big data to improve the quality of care for
patients. Hospitals use vast amounts of historical data to streamline patient pathways,
resulting in better outcomes and reduced cost of care.
 Omnichannel retailer. Retailers use data lakes to capture and consolidate data that's
coming in from multiple touchpoints, including mobile, social, chat, word-of-mouth, and in
person.
 IoT. Hardware sensors generate enormous amounts of semi-structured to unstructured
data on the surrounding physical world. Data lakes provide a central repository for this
information to live in for future analysis.
 Digital supply chain. Data lakes help manufacturers consolidate disparate warehousing
data, including EDI systems, XML, and JSONs.
 Sales. Data scientists and sales engineers often build predictive models to help
determine customer behavior and reduce overall churn.
Data lake Vs Data warehouse
Data lake Data warehouse
Type Structured, semi-structured, Structured
unstructured

Relational, non-relational Relational

Schema Schema on read Schema on write
Format Raw, unfiltered Processed, vetted
Sources Big data, IoT, social media, streaming Application, business, transactional data,
data batch reporting

Scalability Easy to scale at a low cost Difficult and expensive to scale

Users Data scientists, data engineers Data warehouse professionals, business

analysts

Use cases Machine learning, predictive analytics, Core reporting, BI

real-time analytics
Managed Datawarehouses
 Managed cloud data warehouse services enable you to create a data
environment that can adapt and evolve according to your data sources, changing
business requirements, and overall long-term goals.
Characteristic AWS Redshift Azure SQL Data Warehouse
s

Large-scale data processing and analysis Business

Data warehousing Machine learning Business intelligence and analytics Predictive modeling
Use Cases
intelligence reporting Ad hoc queries ETL processing Customer profiling Fraud detection Supply chain
optimization Financial analysis Healthcare analytics

Azure SQL Data Warehouse may not be the best

tool for small to medium-sized businesses that do
Amazon Redshift may not be suitable for applications
When not to not have a significant amount of data to process.
that require real-time data processing or handling of
use Additionally, businesses that do not require
unstructured data.
advanced analytics or machine learning may find
that other tools are more cost-effective.

Amazon Redshift is optimized for handling large-scale Azure SQL Data Warehouse is designed for
Type of data data processing and analytics tasks. It can handle a processing and analyzing large volumes of data. It
processing variety of data types, including structured data from a can handle both structured and unstructured data,
variety of sources. making it a versatile tool for data processing.
Characteristics AWS Redshift Azure SQL Data Warehouse

The tool supports data ingestion from a

Amazon Redshift can ingest data from a variety of sources, variety of sources, including Azure Data
Data ingestion including data lakes, databases, and streaming data Factory, Azure Stream Analytics, and
sources. other Azure services. It can also ingest
data from on-premises data sources.
Azure SQL Data Warehouse provides
built-in support for data transformation
Amazon Redshift provides built-in support for data
Data using SQL Server Integration Services
transformation and cleansing operations, including data
transformation (SSIS). Users can also use Azure Data
type conversion, aggregation, and filtering.
Factory or other tools for data
transformation.
The tool supports machine learning
Amazon Redshift provides integration with various
Machine learning using Azure Machine Learning. Users can
machine learning frameworks and tools, including
support use machine learning models to analyze
SageMaker, TensorFlow, and more.
data and gain insights from it.
Amazon Redshift supports standard SQL, as well as a
Azure SQL Data Warehouse uses SQL for
Query language variety of SQL extensions for handling large-scale data
querying data
processing tasks.
Amazon Redshift is a fully-managed service that can be
The tool is a cloud-based service and
Deployment model deployed in a variety of configurations, including single-
can be deployed on Microsoft Azure.
node and multi-node clusters.
Azure SQL Data Warehouse integrates
Amazon Redshift integrates with a variety of other AWS
Integration with with other Azure services, including
services, including S3, EMR, and more. It also supports
other services Azure Data Factory, Azure Stream
integration with third-party tools and services.
Analytics, and Azure Machine Learning.
Characteristics AWS Redshift Azure SQL Data Warehouse

Amazon Redshift provides robust security features,

The tool provides a range of security features,
including encryption, access control, and
Security including data encryption, user
compliance with various industry standards and
authentication, and access controls.
regulations.
Azure SQL Data Warehouse uses a pay-as-
Amazon Redshift offers a variety of pricing models,
you-go pricing model. Users are charged
Pricing model including pay-as-you-go pricing, reserved instance
based on the amount of data processed and
pricing, and more.
the amount of storage used.
Azure SQL Data Warehouse is highly scalable
Amazon Redshift is designed to be highly scalable, and can handle large amounts of data
Scalability with support for scaling up and down based on processing. Users can scale up or down as
workload requirements. needed to meet their data processing
requirements.
Amazon Redshift provides high-performance data
processing and analytics capabilities through its The tool provides high performance for data
Performance
optimized query engine and support for distributed processing and analysis.
processing.
Amazon Redshift is designed to be highly available, Azure SQL Data Warehouse is highly
Availability with support for automatic failover and data available, with built-in redundancy and
replication across multiple availability zones. failover capabilities.
Amazon Redshift provides built-in fault-tolerance
and data recovery features, ensuring that data The tool is highly reliable, with built-in data
Reliability
processing tasks are completed correctly and replication and disaster recovery capabilities.
accurately.

Data-Analytics AB
No ratings yet
Data-Analytics AB
5 pages
Data Lakes in A Modern Data Architecture
88% (8)
Data Lakes in A Modern Data Architecture
23 pages
Data Warehousing Research Paper
50% (2)
Data Warehousing Research Paper
7 pages
Bring Data Lakes and Data Warehouses Together
100% (1)
Bring Data Lakes and Data Warehouses Together
19 pages
Data Warehouse Design
No ratings yet
Data Warehouse Design
7 pages
A Detailed View Inside Snowflake
No ratings yet
A Detailed View Inside Snowflake
14 pages
Data Warehouse Final Report
No ratings yet
Data Warehouse Final Report
19 pages
Real Scenarios On Data Term 1722747078
No ratings yet
Real Scenarios On Data Term 1722747078
11 pages
Data Warehousing
No ratings yet
Data Warehousing
4 pages
Unit Ii
No ratings yet
Unit Ii
45 pages
Data Warehousing and Dimensional Modeling Notes by Neil Bagchi
No ratings yet
Data Warehousing and Dimensional Modeling Notes by Neil Bagchi
33 pages
Data Warehousing
No ratings yet
Data Warehousing
20 pages
Data Warehouse
No ratings yet
Data Warehouse
11 pages
What Is A Data Warehouse - IBM
No ratings yet
What Is A Data Warehouse - IBM
9 pages
ISDM Group5 Review
No ratings yet
ISDM Group5 Review
23 pages
Module-1: Data Warehousing & Modelling
No ratings yet
Module-1: Data Warehousing & Modelling
13 pages
Data Mining
No ratings yet
Data Mining
3 pages
WA Data Warehouse
No ratings yet
WA Data Warehouse
16 pages
Lec09-Data Warehousing
No ratings yet
Lec09-Data Warehousing
32 pages
Whitepaper: Modern Integrated Data Environment - Qubole
No ratings yet
Whitepaper: Modern Integrated Data Environment - Qubole
11 pages
Data Warehouse
No ratings yet
Data Warehouse
22 pages
Lect 5 Data Warehousing I - 240924 - 033406
No ratings yet
Lect 5 Data Warehousing I - 240924 - 033406
38 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
26 pages
Data Vwarehouse
No ratings yet
Data Vwarehouse
5 pages
CCD Chapter 3 Notes
No ratings yet
CCD Chapter 3 Notes
11 pages
Concept of Big Data
No ratings yet
Concept of Big Data
29 pages
The Data Lakes: A Leap Forward Future of Data Warehousing
No ratings yet
The Data Lakes: A Leap Forward Future of Data Warehousing
5 pages
Chapter 2 Data Warehousing
No ratings yet
Chapter 2 Data Warehousing
57 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
FGFG
No ratings yet
FGFG
6 pages
CCS341 Data Warehousing Unit 1 Notes New
No ratings yet
CCS341 Data Warehousing Unit 1 Notes New
17 pages
Data Ware House
No ratings yet
Data Ware House
6 pages
Trends in Data Warehousing
No ratings yet
Trends in Data Warehousing
3 pages
Data Mining Warehousing I & II
No ratings yet
Data Mining Warehousing I & II
7 pages
L5 DataWarehousing
No ratings yet
L5 DataWarehousing
13 pages
DWDM
No ratings yet
DWDM
12 pages
Unit3 - Cloud Data Storage
No ratings yet
Unit3 - Cloud Data Storage
7 pages
Data Warehousing
No ratings yet
Data Warehousing
71 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
51 pages
Data Warehousing
No ratings yet
Data Warehousing
33 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
Amazon Data Warehouse
No ratings yet
Amazon Data Warehouse
21 pages
DMDW1
No ratings yet
DMDW1
13 pages
Augmenting Data Warehouses With Big Data
No ratings yet
Augmenting Data Warehouses With Big Data
17 pages
Enterprise Data Warehousing On Aws
No ratings yet
Enterprise Data Warehousing On Aws
26 pages
Unit I
No ratings yet
Unit I
36 pages
Data Warehousing and Management
100% (1)
Data Warehousing and Management
7 pages
Data Ware House and Its Purposes
No ratings yet
Data Ware House and Its Purposes
13 pages
Sree Satya.K Rno:8 MBA
No ratings yet
Sree Satya.K Rno:8 MBA
24 pages
Data Warehousing
No ratings yet
Data Warehousing
7 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
Build A True Data Lake With A Cloud Data Warehouse
No ratings yet
Build A True Data Lake With A Cloud Data Warehouse
15 pages
Data Warehousing Database Data Warehouse Data Lake
No ratings yet
Data Warehousing Database Data Warehouse Data Lake
17 pages
DWDM U-1
No ratings yet
DWDM U-1
45 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
BlueGranite Data Lake Ebook
100% (1)
BlueGranite Data Lake Ebook
23 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
50 pages
Data Warehousing
100% (4)
Data Warehousing
28 pages
Rainfall Analysis Implementing On Data Warehouse
No ratings yet
Rainfall Analysis Implementing On Data Warehouse
12 pages
Unit 1
No ratings yet
Unit 1
36 pages
Types of Statistical Analysis Infographic
No ratings yet
Types of Statistical Analysis Infographic
1 page
ItApps (Quiz 4 Reviewer)
No ratings yet
ItApps (Quiz 4 Reviewer)
10 pages
Unit 3
No ratings yet
Unit 3
5 pages
Commerce Cloud Ebook
No ratings yet
Commerce Cloud Ebook
22 pages
5903 14344 1 PB
No ratings yet
5903 14344 1 PB
10 pages
Using R and Tableau Software - 1 PDF
No ratings yet
Using R and Tableau Software - 1 PDF
9 pages
Risk Adjustment in Coding
No ratings yet
Risk Adjustment in Coding
57 pages
Enhancing Supply Chain Resilience A Comparative Study of Predictive Analytics and Advanced Technologies in Healthcare and Retail Sectors
No ratings yet
Enhancing Supply Chain Resilience A Comparative Study of Predictive Analytics and Advanced Technologies in Healthcare and Retail Sectors
13 pages
Sap Predictive Analytics Certification Training
No ratings yet
Sap Predictive Analytics Certification Training
7 pages
Management Information Systems 3rd Edition Rainer Solutions Manual 1
100% (70)
Management Information Systems 3rd Edition Rainer Solutions Manual 1
11 pages
Deloitte Chemicals 4.0 G.wehberg
No ratings yet
Deloitte Chemicals 4.0 G.wehberg
44 pages
History of SPSS Inc
No ratings yet
History of SPSS Inc
7 pages
Retail Space MGMT 104885
No ratings yet
Retail Space MGMT 104885
4 pages
Pip2001 Capstone Project Report 151 Upd (2) Removed
No ratings yet
Pip2001 Capstone Project Report 151 Upd (2) Removed
47 pages
OC - Module 1 - Intro To BDA 021312
No ratings yet
OC - Module 1 - Intro To BDA 021312
38 pages
Foreseeing Employee Attritions Using Div
No ratings yet
Foreseeing Employee Attritions Using Div
7 pages
Predictive Clustering For Credit Scoring
100% (1)
Predictive Clustering For Credit Scoring
5 pages
2025 Recruiting Inno 821473 NDX
No ratings yet
2025 Recruiting Inno 821473 NDX
43 pages
Ohs352 - Project Report Writing Unit 2 Notes
No ratings yet
Ohs352 - Project Report Writing Unit 2 Notes
34 pages
Predictive Buying of Amazon
No ratings yet
Predictive Buying of Amazon
4 pages
More Performance Through Real-Time Monitoring: With The Digital Myplant Apm Solution From Innio Jenbacher
No ratings yet
More Performance Through Real-Time Monitoring: With The Digital Myplant Apm Solution From Innio Jenbacher
7 pages
Nptel Lec1
No ratings yet
Nptel Lec1
21 pages
Top AI Trends To Watch in 2023
No ratings yet
Top AI Trends To Watch in 2023
27 pages
Data Analytics Skills For Managers
No ratings yet
Data Analytics Skills For Managers
10 pages
Rustamji Institute of Technology: Predictive Analytics On Health Care
No ratings yet
Rustamji Institute of Technology: Predictive Analytics On Health Care
12 pages
1) What Is Business Analytics?
No ratings yet
1) What Is Business Analytics?
6 pages
CC Unit - 4 Imp Questions
No ratings yet
CC Unit - 4 Imp Questions
4 pages
Chapter 05
No ratings yet
Chapter 05
21 pages

Module 6

Uploaded by

Module 6

Uploaded by

Module 6

Relational, non-relational Relational

Scalability Easy to scale at a low cost Difficult and expensive to scale

Users Data scientists, data engineers Data warehouse professionals, business

Use cases Machine learning, predictive analytics, Core reporting, BI

Large-scale data processing and analysis Business

Azure SQL Data Warehouse may not be the best

The tool supports data ingestion from a

Amazon Redshift provides robust security features,

You might also like