0% found this document useful (0 votes)
2 views

DW Lecture Unit 1

The document provides an overview of data warehousing, highlighting its components, architecture, and differences from traditional databases. It emphasizes the need for data warehouses in handling large volumes of historical data for analytical purposes, as well as their benefits such as improved data quality and faster query performance. Additionally, it discusses the challenges and costs associated with building data warehouses, along with their applications in various sectors like banking and government.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DW Lecture Unit 1

The document provides an overview of data warehousing, highlighting its components, architecture, and differences from traditional databases. It emphasizes the need for data warehouses in handling large volumes of historical data for analytical purposes, as well as their benefits such as improved data quality and faster query performance. Additionally, it discusses the challenges and costs associated with building data warehouses, along with their applications in various sectors like banking and government.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

CCS341-DATAWAREHOUSING

UNIT I INTRODUCTION TO DATAWAREHOUSE 5

Data warehouse Introduction - Data warehouse components- operational database Vs data


warehouse–Data warehouse Architecture–Three-tier Data Warehouse Architecture-Autonomous
Data Warehouse- Autonomous Data Warehouse Vs Snowflake - Modern Data Warehouse.

Lecture Topic: 1 Data warehouse Introduction


A Database Management System (DBMS) stores data in the form of tables and uses an ER model
and the goal is ACID properties. For example, a DBMS of a college has tables for students, faculty,
etc.
A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is typically
collected from multiple heterogeneous sources like files, DBMS, etc. The goal is to produce
statistical results that may help in decision-making. For example, a college might want to see quick
different results, like how the placement of CS students has improved over the last 10 years, in
terms of salaries, counts, etc.

Need for Data Warehouse


An ordinary Database can store MBs to GBs of data and that too for a specific purpose. For storing
data of TB size, the storage shifted to the Data Warehouse. Besides this, a transactional database
doesn’t offer itself to analytics. To effectively perform analytics, an organization keeps a central
Data Warehouse to closely study its business by organizing, understanding, and using its historical
data for making strategic decisions and analyzing trends.

Benefits of Data Warehouse


Better business analytics: Data warehouse plays an important role in every business to store and
analysis of all the past data and records of the company which can further increase the
understanding or analysis of data for the company.
Faster Queries: The data warehouse is designed to handle large queries that’s why it runs queries
faster than the database.
Improved data Quality: In the data warehouse the data you gathered from different sources is
being stored and analyzed it does not interfere with or add data by itself so your quality of data is
maintained and if you get any issue regarding data quality then the data warehouse team will solve
this.
Historical Insight: The warehouse stores all your historical data which contains details about the
business so that one can analyze it at any time and extract insights from it.
Data Warehouse vs DBMS

Database DataWarehouse

A common Database is based on


operational or transactional processing. A data Warehouse is based on analytical
Each operation is an indivisible processing.
transaction.

A Data Warehouse maintains historical


Generally, a Database stores current and data over time. Historical data is the data
up-to-date data which is used for daily kept over years and can used for trend
operations. analysis, make future predictions and
decision support.

A Data Warehouse is integrated generally


at the organization level, by combining
A database is generally application data from different databases.
specific. Example–A data warehouse integrates the
Example–A data base stores related data, data from one or more data bases , so that
such as the student details in a school. analysis can be done to get results , such
as the best performing school in a city.

Constructing a Database is not so Constructing a Data Warehouse can be


expensive. expensive.

Example Applications of Data Warehousing


Data Warehousing can be applied anywhere where we have a huge amount of data and we want to
see statistical results that help in decision making.
• Social Media Websites: The social networking websites like Facebook, Twitter, Linkedin,
etc. are based on analyzing large data sets. These sites gather data related to members,
groups, locations, etc., and store it in a single central repository. Being a large amount of
data, Data Warehouse is needed for implementing the same.
• Banking: Most of the banks these days use warehouses to see the spending patterns of
account/cardholders. They use this to provide them with special offers, deals, etc.
• Government: Government uses a data warehouse to store and analyze tax payments which
are used to detect tax thefts.

Features of Data Warehousing


Data warehousing is essential for modern data management, providing a strong foundation for
organizations to consolidate and analyze data strategically. Its distinguishing features empower
businesses with the tools to make informed decisions and extract valuable insights from their data.
• Centralized Data Repository: Data warehousing provides a centralized repository for all
enterprisedatafromvarioussources,suchastransactionaldatabases,operationalsystems,
and external sources. This enables organizations to have a comprehensive view of their data,
which can help in making informed business decisions.
• Data Integration: Data warehousing integrates data from different sources into a single,
unified view, which can help in eliminating data silos and reducing data inconsistencies.
• Historical Data Storage: Data warehousing stores historical data, which enables
organizations to analyze data trends over time. This can help in identifying patterns and
anomalies in the data, which can be used to improve business performance.
• Query and Analysis: Data warehousing provides powerful query and analysis capabilities
that enable users to explore and analyze data in different ways. This can help in identifying
patterns and trends, and can also help in making informed business decisions.
• Data Transformation: Data warehousing includes a process of data transformation, which
involves cleaning, filtering, and formatting data from various sources to make it consistent
and usable. This can help in improving data quality and reducing data inconsistencies.
• Data Mining: Data warehousing provides data mining capabilities, which enable
organizations to discover hidden patterns and relationships in their data. This can help in
identifying new opportunities, predicting future trends, and mitigating risks.
• Data Security: Data warehousing provides robust data security features, such as access
controls, data encryption, and data backups, which ensure that the data is secure and
protected from unauthorized access.
Advantages of Data Warehousing
• Intelligent Decision-Making: With centralized data in warehouses, decisions may be made
more quickly and intelligently.
• Business Intelligence: Provides strong operational insights through business intelligence.
• Historical Analysis: Predictions and trend analysis are made easier by storing past data.
• Data Quality: Guarantees data quality and consistency for trust worthy reporting.
• Scalability: Capable of managing massive data volumes and expanding to meet changing
requirements.
• Effective Queries: Fast and effective data retrieval is made possible by an optimized
structure.
• Cost reductions: Data warehousing can result in cost savings over time by reducing data
management procedures and increasing overall efficiency, even when there are setup costs
initially.
• Data security: Data warehouses employ security protocols to safeguard confidential
information, guaranteeing that only authorized personnel are granted access to certain data.
Disadvantages of Data Warehousing
• Cost: Building a data warehouse can be expensive, requiring significant investments in
hardware, software, and personnel.
• Complexity: Data warehousing can be complex, and businesses may need to hire
specialized personnel to manage the system.
• Time-consuming: Building a data warehouse can take a significant amount of time,
requiring businesses to be patient and committed to the process.
• Data integration challenges: Data from different sources can be challenging to integrate,
requiring significant effort to ensure consistency and accuracy.
• Data security: Data warehousing can pose data security risks, and businesses must take
measures to protect sensitive data from unauthorized access or breaches.
Data warehouse components

Architecture is the proper arrangement of the elements. We build a data warehouse with software and
hardware components. To suit the requirements of our organizations, we arrange these building we
may want to boost up another part with extra tools and services. All of these depends on our
circumstances.

The figure shows the essential elements of atypical warehouse. We see the Source Data component
shows on the left. The Data staging element serves as the next building block. In the middle, we see
the Data Storage component that handles the data warehouses data. This element not only stores
and manages the data; it also keeps track of data using the metadata repository. The Information
Delivery component shows on the right consists of all the different ways of making the information
from the data warehouses available to the users.

Source Data Component


Source data coming in to the data ware houses may be grouped into four broad categories:
• Production Data: This type of data comes from the different operating systems of the
enterprise. Based on the data requirements in the data warehouse, we choose segments of
the data from the various operational modes.
• Internal Data: In each organization, the client keeps their "private" spreadsheets, reports,
customer profiles, and sometimes even department databases. This is the internal data, part
of which could be useful in a data warehouse.
• Archived Data: Operational systems are mainly intended to run the current business. In
every operational system, we periodically take the old data and store it in achieved files.
• External Data: Most executives depend on information from external sources for a large
percentage of the information they use. They use statistics associating to their industry
produced by the external department.
Data Staging Component

After we have been extracted data from various operational systems and external sources, we have to
prepare the files for storing in the data warehouse. The extracted data coming from several different
sources need to be changed, converted, and made ready in a format that is relevant to be saved for
querying and analysis.

1) Data Extraction: This method has to deal with numerous data sources. We have to employ the
appropriate techniques for each data source.

2) Data Transformation: As we know, data for a data warehouse comes from many different
sources. If data extraction for a data warehouse posture big challenges, data transformation present
even significant challenges. We perform several individual tasks as part of data transformation.

First, we clean the data extracted from each source. Cleaning may be the correction of misspellings or
may deal with providing default values for missing data elements, or elimination of duplicates when
we bring in the same data from various source systems.

Standardization of data components forms a large part of data transformation. Data transformation
contains many forms of combining pieces of data from different sources. We combine data from
single source record or related data parts from many source records.

On the other hand, data transformation also contains purging source data that is not useful and
separating out source records into new combinations. Sorting and merging of data take place on a
large scale in the data staging area. When the data transformation function ends, we have a
collection of integrated data that is cleaned, standardized, and summarized.

3) Data Loading: Two distinct categories of tasks form data loading functions. When we complete
the structure and construction of the data warehouse and go live for the first time, we do the initial
loading of the information into the data warehouse storage. The initial load moves high volumes of
data using up a substantial amount of time.

Data Storage Components

Data storage for the data warehousing is a split repository. The data repositories for the operational
systems generally include only the current data. Also, these data repositories include the data
structured in highly normalized for fast and efficient processing.

Information Delivery Component

The information delivery element is used to enable the process of subscribing for data warehouse files
and having it transferred to one or more destinations according to some customer-specified
scheduling algorithm.
Meta data Component

Metadata in a data warehouse is equal to the data dictionary or the data catalog in a database
managementsystem.Inthedatadictionary,wekeepthedataaboutthelogicaldatastructures,the data about
the records and addresses, the information about the indexes, and so on.

Data Marts

It includes a subset of corporate-wide data that is of value to a specific group of users. The scope is
confined to particular selectedsubjects.Datainadatawarehouseshouldbeafairlycurrent,butnot mainly
up to the minute, although development in the data warehouse industry has made standard and
incremental data dumps more achievable. Data marts are lower than data warehouses and usually
contain organization. The current trends in data warehousing are to develope a data warehouse with
several smaller related data marts for particular kinds of queries and reports.

Management and Control Component

The management and control elements coordinate the services and functions within the data
warehouse. These components control the data transformation and the data transfer into the data
warehouse storage. On the other hand, it moderates the data delivery to the clients. Its work with the
database management systems and authorizes data to be correctly saved in the repositories. It
monitors the movement of information into the staging method and from there into the data
warehouses storage itself.

Database DataWarehouse

1. It is used for Online Transactional Processing 1. It is used for Online Analytical Processing
(OLTP) but can be used for other objectives such as (OLAP).This reads the historical information
Data Warehousing. This records the data from the for the customers for business decisions.
clients for history.

2. The tables and joins are complicated since they are 2. The tables and joins are accessible since
normalized for RDBMS. This is done to reduce they are de-normalized. This is done to
redundant files and to save storage space. minimize the response time for analytical
queries.
3.Dataisdynamic 3.Dataislargely static

4. Entity: Relational modeling procedures are used for 4. Data: Modeling approach are used for the
RDBMS database design. Data Warehouse design.

5. Optimized for write operations. 5. Optimized for read operations.

6. Performance is low for analysis queries. 6.High performance for analytical queries.

7. The database is the place where the data is taken as a 7. Data Warehouse is the place where the
base and managed to get available fast and efficient application data is handled for analysis and
access. reporting objectives.

Lecture Topic: 2 OPERATIONAL DATABASE Vs DATA WAREHOUSE

Difference between Operational Data base and Data Warehouse

The Operational Database is the source of information for the data warehouse. It includes detailed
information used to run the day to day operations of the business. The data frequently changes as
updates are made and reflect the current value of the last transactions.

Operational Database Management Systems also called as OLTP (Online Transactions Processing
Databases), are used to manage dynamic data in real-time.

Data Warehouse Systems serve users or knowledge workers in the purpose of data analysis and
decision-making. Such systems can organize and present information in specific formats to
accommodate the diverse needs of various users. These systems are called as Online-Analytical
Processing (OLAP) Systems.

Data Warehouse and the OLTP database are both relational databases. However, the goals of both
these databases are different.
Data Warehouse
Operational Database

Operational systems are designed to support Data warehousing systems are typically
high-volume transaction processing. designed to support high-volume analytical
processing (i.e., OLAP).

Operational systems are usually concerned Data warehousing systems are usually
with current data. concerned with historical data.

Data within operational systems are mainly Non-volatile, new data may be added
updated regularly according to need. regularly. Once Added rarely changed.

It is designed for real-time business dealing It is designed for analysis of business


and processes. measures by subject area, categories, and
attributes.

It is optimized for a simple set of transactions, It is optimized for extent loads and high,
generally adding or retrieving a single row at a complex, unpredictable queries that access
time per table. many rows per table.

It is optimized for validation of incoming Loaded with consistent, valid information,


information during transactions, uses requires no real-time validation.
validation data tables.

It supports thousands of concurrent clients. It supports a few concurrent clients relative


to OLTP.

Operational systems are widely process- Data warehousing systems are widely
oriented. subject-oriented

Operational systems are usually optimized to Data warehousing systems are usually
perform fast inserts and updates of optimized to perform fast retrievals of
associatively small volumes of data. relatively high volumes of data.

Data In Data Out

Less Number of data accessed. Large Number of data accessed.

Relational data bases are created for on-line Data Warehouse designed for on-line
transactional Processing (OLTP) Analytical Processing (OLAP)

Difference between OLTP and OLAP OLTP System


OLTP System handle with operational data. Operational data are those data contained in the
operation of a particular system. Example, ATM transactions and Bank transactions, etc.

OLAP System

OLAP handle with Historical Data or Archival Data. Historical data are those data that are achieved
over a long period. For example, if we collect the last 10years information about flight reservation,
the data can give us much meaningful data such as the trends in the reservation. This may provide
useful information like peak time of travel, what kind of people are traveling in various classes
(Economy/Business) etc.

The major difference between an OLTP and OLAP system is the amount of data analyzed in a
single transaction. Where as an OLTP manage many concurrent customers and queries touching
only an individual record or limited groups of files at a time. An OLAP system must have the
capability to operate on millions of files to answer a single query.

Feature OLTP OLAP

Characteristic It is a system which is used to It is a system which is used to manage


manage operational Data. informational Data.

Users Clerks, clients, and information Knowledge workers, including managers,


technology professionals. executives, and analysts.

System OLTP system is a customer- OLAP system is market-oriented, knowledge


orientation oriented, transaction, and query workers including managers, do data analysts
processing are done by clerks, executive and analysts.
clients, and information
technology professionals.

Data contents OLTP system manages current OLAP system manages a large amount of
data that too detailed and are used historical data, provides facilitates for
for decision making. summarization and aggregation, and stores and
manages data at different levels of granularity.
This information makes the data more
comfortable to use in informed decision
making.

Data base Size 100MB-GB 100GB-TB

Data base design OLTP system usually uses an OLAP system typically uses either a star or
entity-relationship (ER) data snow flake model and subject-oriented
model and application-oriented database design.
database design.

View OLTP system focuses primarily OLAP system often spans multiple versions of
on the current data within an a database schema, due to the evolutionary
enterprise or department, without process of an organization. OLAP systems also
Referring to historical deal with data that originates from various
information or data indifferent organizations, integrating information from
organizations. many data stores.

Volume of data Not very large Because of their large volume, OLAP data are
stored on multiple storage media.

Access patterns The access patterns of an OLTP Accesses to OLAP systems are mostly read-
system subsist mainly of short, only methods because of these data warehouses
atomic transactions. Such a stores historical data.
system requires concurrency
control and recovery techniques.

Access mode Read/write Mostly write

Insert and Short and fast inserts and updates Periodic long-running batch jobs refresh the
Updates proposed by end-users. data.

Number of Tens Millions


records
accessed

Normalization Fully Normalized Partially Normalized

Processing Very Fast It depends on the amount of files contained,


Speed batch data refresh, and complex query may take
many hours, and query speed can be upgraded
by creating indexes.

Lecture Topic: 3 DATA WAREHOUSE ARCHITECTURE


Why do Business Analysts need Data Warehouse?
A data warehouse is a repository of an organization’s electronically stored data. Data warehouses
aredesignedtofacilitatereportingandanalysis.Itprovidesmanyadvantagestobusinessanalysts as
follows:
1. A data warehouse may provide a competitive advantage by presenting relevant
information from which to measure performance and make critical adjustments in order to
help win
over competitors.
2. A data warehouse can enhance business productivity since it is able to quickly
and efficiently gather information, which accurately describes the organization.
3. A data warehouse facilitates customer relationship marketings that provide a consistent
view of customers and items across all lines of business, all departments, and all markets.
4. A data warehouse may bring about cost reduction by tracking trends, patterns, and
exceptions over long periods of time in a consistent and reliable manner.
5. A data warehouse provides a common data model for all data of interest, regardless of
The data’s source.This makes it easier to report and analyze information than it would
be if multiple data models from disparate sources were used to retrieve information such
as sales invoices, order receipts, general ledger charges, etc.
6. Because they are separate from operational systems, data warehouses provide retrieval
of data without slowing down operational systems.
Process of Data Warehouse Design
A data warehouse can be built using three approaches:
1. Atop-down approach
2. A bottom-up approach
3. A combination of both approaches
The top-down approach starts with the overall design and planning. It is useful in cases where
the technology is mature and well-known, and where the business problems that must be solved
are clear and well-understood.
The bottom-up approach starts with experiments and prototypes. This is useful in the early stage
of business modeling and technology development. It allows an organisation to move forward at
considerablylessexpenseandtoevaluatethebenefitsofthetechnologybeforemakingsignificant
commitments.
In the combined approach, an organization can exploit the planned and strategic nature of the
top-downapproachwhileretainingtherapidimplementationandopportunisticapplicationof the
bottom-up approach.
In general, the warehouse design process consists of the following steps:
1. Choose a business process to model, e.g., orders, invoices, shipments, inventory,
account administration, sales, and the general ledger. If the business process is
organisational and involves multiple, complex object collections, a data warehouse model
should be followed. However, if the process is departmental and focuses on the analysis of
one kind of business process, a data mart model should be chosen.
2. Choose the grain of the business process. The grain is the fundamental, atomic level of
data to be represented in the fact table for this process, e.g., individual transactions,
individual daily snapshots, etc.
3. Choose the dimensions that will apply to each fact table record. Typical dimensions
are time, item, customer, supplier, warehouse, transaction type, and status.
4. Choose the measures that will populate each fact table record. Typical measures are
numeric additive quantities like dollars-sold and units-sold.
Once a data warehouse is designed and constructed, the initial deployment of the
warehouse includes initial installation, rollout planning, training and orientation. Platform
upgrades and maintenance must also be considered. Data warehouse administration will
include data
refreshment, data source synchronisation, planning for disaster recovery, managing access control
and security, managing data growth, managing database performance, and data warehouse
enhancement and extension.

A Three-tier Data Warehouse Architecture


Data Warehouses generally have a three-level(tier) architecture that includes:
1. A bottom tier that consists of the Data Warehouse server, which is almost always a RDBMS.
It may include several specialised data marts and a metadata repository,
2. A middle tier that consists of an OLAP server for fast querying of the data warehouse.
The OLAP server is typically implemented using either (1) a Relational OLAP (ROLAP)
model, i.e., an extended relational DBMS that maps operations on multidimensional data
to standard relational operations; or(2) a Multidimensional OLAP(MOLAP) model,i.e., a
special purpose server that directly implements multidimensional data and operations.
3. A top tier that includes front-end tools for displaying results provided by OLAP, as well
as additional tools for data mining of the OLAP-generated data.
Data Warehouse Models

From the architecture point of view, there are three data warehouse models: the virtual warehouse,
the data mart, and the enterprise warehouse.
Virtual Warehouse: A virtual warehouse is created based on a set of views defined for an
operational RDBMS. This warehouse type is relatively easy to build but requires excess
computational capacity of the underlying operational database system. The users directly access
operational data via middleware tools. This architecture is feasible only if queries are posed
infrequently, and usually is used as a temporary solution until a permanent data warehouse is
developed.
Data Mart: The data mart contains a subset of the organisation-wide data that is of value to a
small group of users, e.g., marketing or customer service. This is usually a precursor (and/or a
successor) of the actual data warehouse, which differs with respect to the scope that is confined
to a specific group of users.
Depending on the source of data, data marts can be categorized in to the following two classes:
1. Independent data marts are sourced from data captured from one or more operational
systems or external information providers, or from data generated locally within a
particular department or geographic area.
2. Dependent data marts are sourced directly from enterprise data warehouses.
Enterprise warehouse: This warehouse type holds all information about subjects spanning the
entire organisation. For a medium-to a large-size company, usually several years are needed to
design and build the enterprise warehouse.
The differences between the virtual and the enterprise DWs are shown in Figure 1.4. Data marts
canalsobecreatedassuccessorsofanenterprisedatawarehouse.Inthiscase,theDWconsistsof an
enterprise warehouse and (several) data marts.
Lecture Topic 4: AUTONOMOUS DATA WAREHOUSE

Autonomous Data Warehouse (ADW) is a cloud-based data base service provided by Oracle. It is
part of Oracle's Autonomous Database offerings, which also include Autonomous Transaction
Processing (ATP).ADWisdesignedtosimplifydatabasemanagement,reduceoperationalcosts, and
improve performance by leveraging automation and cloud technologies.

Key features of Oracle Autonomous Data Warehouse include:

1. Automation: ADW automates various database management tasks, such as provisioning,


patching, tuning, and backups. This helps to reduce manual effort, minimize errors, and
enhance overall system performance.
2. Self-Driving Capability: The "self-driving" aspect of ADW means that the data base
can automatically adapt to changing workloads, optimize itself for performance, and
apply security updates without human intervention.
3. Scalability: ADW provides the ability to easily scale computing resources up or down
based on demand. This ensures that the database can handle varying workloads
efficiently.
4. Performance: With features like automatic indexing, performance tuning, and in-memory
processing, ADW aims to deliver high-performance analytics and reporting capabilities.
5. Security: ADW incorporates security measures such as encryption, access controls, and
auditing to protect sensitive data. Oracle also manages and applies security patches
automatically.
6. Compatibility: ADW is compatible with various data integration and analytics tools,
making it easier to integrate into existing workflows and environments.
7. Cloud-Native: Being a cloud-based service, ADW is hosted on Oracle Cloud
Infrastructure. This allows users to take advantage of the scalability, flexibility, and pay-as-
you-go pricing model associated with cloud computing.
8. Support for Multi-Model Data: ADW supports both relational and non-relational data
types, making it suitable for a variety of data processing needs.

Autonomous Data Warehouse Vs Snowflake


Oracle Autonomous Data Warehouse (ADW) and Snowflake are both cloud-based data warehousing
solutions, but they have some differences in terms of architecture, features, and
approachtodatamanagement.Here'sacomparisonbetweenOracleAutonomousDataWarehouse and
Snowflake:

1. Vendor:
• OracleADWisaproductofOracleCorporation,awell-establisheddatabasevendor with a
long history in the industry.
• Snowflakeisacloud-nativedatawarehousingplatformdevelopedbySnowflake
Computing, a newer entrant to the market.
2. Architecture:
• Oracle ADW is built on Oracle Data base technology and is part of the Oracle
Cloud Infrastructure. It utilizes Oracle's Autonomous Data base technology, which
includes self-driving, self-securing, and self-repairing capabilities.
• Snowflake is built as a multi-cloud, multi-cluster, and multi-region data
warehouse service. It has a unique architecture that separates storage and compute
resources, providing elasticity and scalability.
3. Automation:
• Both ADW and Snowflake emphasize automation. ADW, as part of the Oracle
Autonomous Data base family, is designed to automate various database
management tasks, including provisioning, patching, and tuning.
• Snowflake also offers automation features, such as automatic scaling of compute
resources based on demand and automatic performance optimization.
4. Scalability:
• ADW provides the ability to scale computing resources up or down based on
workload demands, allowing for flexibility in resource allocation.
• Snowflake's architecture allows for independent scaling of compute and storage,
providing the ability to scale resources independently, and it automatically
handles the distribution of data across clusters.
5. Performance:
• Both ADW and Snow flake aim to provide high-performance data
warehousing. ADW includes features like automatic indexing and in-memory
processing.
• Snow flake is known for its ease of scaling, enabling users to achieve
high performance by adding or removing compute resources as needed.
6. Multi-Cloud Support:
• Snow flake is designed to work seamlessly across multiple cloud providers, such
as AWS, Azure, and Google Cloud Platform, providing customers with flexibility
in choosing their preferred cloud infrastructure.
• OracleADWispartoftheOracleCloudInfrastructureandisprimarilyhostedon
Oracle's cloud.
7. Pricing Model:
• Both ADW and Snowflake offer consumption-based pricing models. Snowflake's
pricing is based on the amount of storage used and the amount of compute resources
consumed.
• Oracle ADW follows a similar model, charging users based on the resources
they consume.

Lecture Topic: 5 Modern Data Warehouse

A modern data warehouse (MDW) is an evolution of traditional data warehousing approaches,


leveraging contemporary technologies, architectures, and best practices to address the growing
challenges and requirements of handling and analyzing large volumes of data. Here are key
characteristics and components of a modern data warehouse:

1. Cloud-Native Architecture:
• Modern data warehouses are often built on cloud platforms, such as AWS, Azure, or
Google Cloud, to take advantage of scalable and flexible computing resources, as
well as the ability to pay for resources on a consumption basis.
2. Data Lakes Integration:
• Integration with data lakes allows for the storage and analysis of both structured
and unstructured data. This integration supports diverse data types and enables
more comprehensive analytics.
3. Scalability:
• Modern data warehouses are designed to scale horizontally and vertically, allowing
organizations to easily add or remove resources based on data volume and
processing needs.
4. Automated Data Management:
• Automation is a key aspect, covering various tasks such as data ingestion, data
transformation, and data quality checks. Automated processes reduce manual
effort, enhance efficiency, and improve overall system reliability.
5. Data Virtualization:
• Data virtualization enables users to access and analyze data without physically
moving it. This can be particularly usefull for integrating data from multiple
sources and providing a unified view without the need for extensive data
movement.
6. Advanced Analytics and Machine Learning:
• Modern data warehouses often incorporate advanced analytics and machine
learning capabilities directly within the platform. This allows organizations to
derive insights from data and build predictive models without having to move the
data to external systems.
7. Real-Time Data Processing:
• The ability to handle real-time data processing and analytics is a crucial aspect of a
modern data warehouse. This is especially important for organizations that require
up-to-the-minute insights for decision-making.
8. Security and Compliance:
• Security features are a priority, including robust authentication, encryption, and
compliance with regulatory standards. Modern data warehouses often provide
fine- grained access controls to ensure data privacy and security.
9. Cost Management:
• Cost-effective solutions are a focus, with modern data warehouses allowing
organizationstopayfortheresourcestheyconsume.Thispay-as-you-gomodelis often
more cost-efficient than traditional on-premises solutions.
10. Integration with BI Tools and Visualization:
• Seamless integration with business intelligence (BI) tools and visualization
platformsisessentialtoempoweruserstoeasilyanalyzeandvisualizedatastoredin the
warehouse.
11. Flexible Data Models:
• Modern data warehouses support flexible data models, including both relational and
non-relational data. This flexibility accommodates diverse data types and structures.
12. Data Governance:
• Robust data governance features are included to ensure data quality, lineage, and
compliance with regulatory requirements. This includes meta data management,
data cataloging, and lineage tracking.

You might also like