0% found this document useful (0 votes)

19 views57 pages

Chapter 2 Data Warehousing

Uploaded by

Medhansh Shinde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views57 pages

Chapter 2 Data Warehousing

Uploaded by

Medhansh Shinde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

You are on page 1/ 57

Data

Warehousing

Prof Sadanand S Borse

Data Warehouse
 A data warehouse can be defined as a collection of organizational data and
information extracted from operational sources and external data sources.

 The data is periodically pulled from various internal applications like sales,
marketing, and finance; customer-interface applications; as well as external
partner systems.

 This data is then made available for decision-makers to access and analyze.

 Data warehouses are exclusively intended to perform queries and analysis and
often contain large amounts of historical data. The data within a data
warehouse is usually derived from a wide range of sources such as application
log files and transaction applications.
Key Characteristics of Data
Warehouse
 Subject-Oriented
 Integrated
 Non-Volatile
 Time-Variant
A typical data warehouse often includes
the following elements:
• A relational database to store and manage data
• An extraction, loading, and transformation (ELT)
solution for preparing the data for analysis
• Statistical analysis, reporting, and data mining
capabilities
• Client analysis tools for visualizing and presenting data
to business users
• Other, more sophisticated analytical applications that
generate actionable information by applying
data science and artificial intelligence (AI) algorithms,
or graph and spatial features that enable more kinds of
What is ELT?
 It's the process of collecting data from multiple sources and transforming it into a
usable format for analysis.
 Extract, transform, and load (ETL) is a data pipeline used to collect data from
various sources.
 ETL stands for extract, transform and load, which is a data integration process
that combines data from multiple data sources into a single, consistent data store
that is loaded into a data warehouse or other target system.
• Extract data from legacy systems
• Cleanse the data to improve data quality and establish consistency
• Load data into a target database
Database vs Data Warehouse
 A data warehouse and a traditional database share some
similarities, But they need not be the same idea.
 The main difference is that in a database, data is collected for
multiple transactional purposes.
 In a data warehouse, data is collected on an extensive scale to
perform analytics.
 Databases provide real-time data, while warehouses store data to
be accessed for big analytical queries.
 Data warehouse is an example of an OLAP system or an online
database query answering system. OLTP is an online database
modifying system, for example, ATM.
Data Warehouse Architecture

 Simple.
 Simple with a staging area.
 Hub and spoke.
 Sandboxes
Data Warehouse Architecture

 Thedata warehouse architecture comprises a three-tier

structure.
 Bottom Tier
 Middle Tier
 Top Tier
How Data Warehouse Works

 Data Warehousing integrates data and information collected from

various sources into one comprehensive database.
 Data mining is one of the features of a data warehouse that involves
looking for meaningful data patterns in vast volumes of data and
devising innovative strategies for increased sales and profits.
 Data Warehouse works as a central repository where information arrives
from one or more data sources. Data flows into a data warehouse from
the transactional system and other relational databases.
 Data may be:
1. Structured
2. Semi-structured
3. Unstructured data
 The data is processed, transformed, and ingested so that users can access the processed data in the Data
Warehouse through Business Intelligence tools, SQL clients, and spreadsheets.
 A data warehouse merges information coming from different sources into one comprehensive database.
 By merging all this information in one place, an organization can analyze its customers more holistically.
This helps to ensure that it has considered all the information available.
 Data warehousing makes data mining possible. Data mining is looking for patterns in the data that may lead
to higher sales and profits.
Types of Data Warehouse

 Offline Operational Database:

 Offline Data Warehouse:
 Real time Data Warehouse:
 Integrated Data Warehouse:
Four components of Data
Warehouses are:
 Load manager
 Warehouse Manager:
 Query Manager:
 End-user access tools:
Data lakes
 A data lake is a centralized repository that allows you to store all your
structured and unstructured data at any scale.
 It is a place to store every type of data in its native format with no fixed
limits on account size or file.
 It offers high data quantity to increase analytic performance and native
integration.
 Data Lake is like a large container that is very similar to a real lake and
river.
 Just like in a lake you have multiple tributaries coming in, a data lake
has structured data, unstructured data, machine to machine, and logs
flowing through in real-time.
 A data lake can include structured data from relational
databases (rows and columns), semi-structured data
(CSV, logs, XML, JSON), unstructured data (emails,
documents, PDFs) and binary data (images, audio,
video.
 A data lake can be recognized as "on premises" (within
an organization's data centers) or "in the cloud" (using
cloud services from vendors such as Amazon, Microsoft,
or Google).
 The Data Lake democratizes data and is a cost-effective
way to store all data of an organization for later
processing. Research Analysts can focus on finding
meaningful patterns in data and not data itself.
 Unlike a hierarchal Data Warehouse where data is
stored in Files and Folder, Data lake has a flat
architecture. Every data element in a Data Lake is given
a unique identifier and tagged with a set of metadata
information.
 The main objective of building a data lake is to offer an
unrefined view of data-to-data scientists.
Reasons for using Data Lake
 There is no need to model data into an enterprise-wide
schema with a Data Lake.
 With the increase in data volume, data quality, and
metadata, the quality of analyses also increases.
 Data Lake offers business Agility
 Machine Learning and Artificial Intelligence can be used to
make profitable predictions.
 It offers a competitive advantage to the implementing
organization.
 There is no data silo structure. Data Lake gives 360 degrees
view of customers and makes analysis more robust
Characteristics Data Warehouse Data Lake

Non-relational and relational from

Relational from transactional systems, operational IoT devices, web sites, mobile apps,
Data
databases, and line of business applications social media, and corporate
applications

Designed prior to the DW implementation (schema-on- Written at the time of analysis

Schema
write) (schema-on-read)

Price/ Query results getting faster using

Fastest query results using higher cost storage
Performance low-cost storage

Any data that may or may not be

Data Quality Highly curated data that serves as the central version of
curated (ie. raw data)
the truth

Data scientists, Data developers,

Users Business analysts and Business analysts (using
curated data)

Machine Learning, Predictive

Analytics Batch reporting, BI and visualizations analytics, data discovery and
profiling
1. into the data lake in batches or in real-time
2. Insights Tier: The tiers on the right represent the research side
where insights from the system are used. SQL, NoSQL queries, or
even excel could be used for data analysis.
3. HDFS is a cost-effective solution for both structured and
unstructured data. It is a landing zone for all data that is at rest in
the system.
4. Distillation tier takes data from the storage tire and converts it to
structured data for easier analysis.
5. Processing tier run analytical algorithms and users queries with
varying real-time, interactive, batch to generate structured data for
easier analysis.
6. Unified operations tier governs system management and
monitoring. It includes auditing and proficiency management, data
management, workflow management.
7. Ingestion Tier: The tiers on the left side depict the data sources. The
Key Concepts of Data Lake
 Data mining is the process of discovering actionable
information from large sets of data. Data mining uses
mathematical analysis to derive patterns and trends
that exist in data.
 Typically, these patterns cannot be discovered by
traditional data exploration because the relationships
are too complex or because there is too much data.
 Data mining is the process of finding anomalies,
patterns, and correlations within large data sets to
predict outcomes.
Data mining as a step in the process of
knowledge discovery
The architecture of a typical data
mining system has the following
major components
 Database, data warehouse, World Wide Web, or another
information repository
 Database or data warehouse server
 Knowledge base
 Datamining engine:
 Pattern evaluation module:
 User interface:
 Data mining process is the discovery through large data sets of
patterns, relationships and insights that guide enterprises
measuring and managing where they are and predicting where
they will be in the future.
Large amount of data and databases can come from various data
sources and may be stored in different data warehouses. And data
mining techniques such as machine learning, artificial intelligence
(AI) and predictive modeling can be involved.
The data mining process requires commitment and business
intelligence tools. But experts agree, across all industries, the data
mining process is the same. And should follow a prescribed path.
Six steps of Data Mining
1. Business understanding

• it is required to understand business objectives clearly and

find out what are the business’s needs.
• Next, assess the current situation by finding the resources,
assumptions, constraints, and other important factors which
should be considered.
• Then, from the business objectives and current situations,
create data mining goals to achieve the business objectives
within the current situation.
• Finally, a good data mining plan has to be established to
achieve both business and data mining goals. The plan should
be as detailed as possible.
2. Data understanding
• The data understanding phase starts with initial data collection, which
is collected from available data sources, to help get familiar with the
data. Some important activities must be performed including data load
and data integration in order to make the data collection successful.
• Next, the “gross” or “surface” properties of acquired data need to be
examined carefully and reported.
• Then, the data needs to be explored by tackling the data mining
questions, which can be addressed using querying, reporting, and
visualization.
• Finally, the data quality must be examined by answering some
important questions such as “Is the acquired data complete?”, “Is there
any missing values in the acquired data?”
3. Data preparation

 The data preparation typically consumes about 90% of

the time of the project.
 The outcome of the data preparation phase is the final
data set. Once available data sources are identified,
they need to be selected, cleaned, constructed, and
formatted into the desired form.
 The data exploration task at a greater depth may be
carried out during this phase to notice the patterns
based on business understanding.
4. Modeling

• First, modeling techniques have to be selected to be

used for the prepared data set.
• Next, the test scenario must be generated to validate the
quality and validity of the model.
• Then, one or more models are created on the prepared
data set.
• Finally, models need to be assessed carefully involving
stakeholders to make sure that created models are met
business initiatives.
5. Evaluation

 In the evaluation phase, the model results must be

evaluated in the context of business objectives in the
first phase.
 In this phase, new business requirements may be
raised due to the new patterns that have been
discovered in the model results or from other factors.
 Gaining business understanding is an iterative process
in data mining. The go or no-go decision must be made
in this step to move to the deployment phase.
6. Deployment
 The knowledge or information, which is gained through the data
mining process, needs to be presented in such a way that
stakeholders can use it when they want it.
 Based on the business requirements, the deployment phase could be
as simple as creating a report or as complex as a repeatable data
mining process across the organization.
 In the deployment phase, the plans for deployment, maintenance,
and monitoring have to be created for implementation and also future
support.
 From the project point of view, the final report of the project needs to
summarize the project experiences and review the project to see what
needs to be improved created learned lessons.
OLAP

 OLAP (Online Analytical Processing) is a category of database

processing that facilitates business intelligence.
 OLAP (Online Analytical Processing) is the technology behind
many Business Intelligence (BI) applications.
 OLAP is a powerful technology for data discovery, including
capabilities for report viewing, complex analytical calculations,
and predictive “what if” scenario (budget, forecast) planning.
 In a data warehouse, data sets are stored in tables, each of which
can organize data into just two of these dimensions at a time.
 OLAP extracts data from multiple relational data sets and
reorganizes it into a multidimensional format that enables very
fast processing and very insightful analysis.
 OLAP tools do not store individual transaction records in two-
dimensional, row-by-column format, like a worksheet, but
instead, use multidimensional database structures—known
as Cubes in OLAP terminology—to store arrays of consolidated
information.
 The data and formulas are stored in an optimized
multidimensional database, while views of the data are created
on-demand.
What is an OLAP cube?
 OLAP databases are divided into one or more
cubes. The cubes are designed in such a way
that creating and viewing reports become
easy.
 The OLAP cube is an array-based
multidimensional database that makes it
possible to process and analyze multiple data
dimensions much more quickly and efficiently
than a traditional relational database.
 In theory, a cube can contain
an infinite number of layers.
 Smaller cubes can exist
within layers—for example,
each store layer could contain
cubes arranging sales by
salesperson and product.
 In practice, data analysts will
create OLAP cubes containing
just the layers they need, for
optimal analysis and
performance.
Examples of OLAP Tools

 Dundas BI
 Sisense
 IBM Cognos Analytics
 InetSoft
 SAP Business Intelligence
 Halo
OLAP for Multidimensional Analysis

 To analyze and report on the health of a

business and plan future activity, many
variable groups or parameters must be tracked
on a continuous basis—which is beyond the
scope of any number of linked spreadsheets.
 These variable groups or parameters are called
Dimensions in the On-Line Analytical
Processing (OLAP) environment.
 Analysts can take any view, or Slice, of a Cube to produce a
worksheet-like view of points of interest.
 Instead of working on two dimensions (standard spreadsheet) or
three dimensions (for example, a workbook with tabs of the same
report, by one variable), companies have many dimensions to
track.
 For example, a business that distributes goods from more than a
single facility will have at least the following Dimensions to
consider: Accounts, Locations, Periods, Salespeople, and Products.
 These Dimensions comprise a base for the company’s planning,
analysis, and reporting activities.
 Together they represent the “whole” business picture, providing the
foundation for all business planning, analysis and reporting
activities.
Advantages of OLAP

 OLAP is a platform for all type of business includes

planning, budgeting, reporting, and analysis.
 Information
and calculations are consistent in an
OLAP cube. This is a crucial benefit.
 Quickly create and analyze “What if” scenarios
 Easily search OLAP database for broad or specific
terms.
 OLAP provides the building blocks for business
modeling tools, Data mining tools, performance
reporting tools.
Advantages of OLAP

 Allows users to do slice and dice cube data all by

various dimensions, measures, and filters.
 It is good for analyzing time series.
 Finding some clusters and outliers is easy with OLAP.
 It is a powerful visualization online analytical process
system which provides faster response times
Drill Down
• Drill down: In drill-down
operation, the less detailed data
is converted into highly detailed
data. It can be done by:
• Moving down in the concept
hierarchy
• Adding a new dimension
 In the cube given in the overview
section, the drill-down operation
is performed by moving down in
the concept hierarchy of the
Time dimension (Quarter ->
Month).
Roll UP

 Roll-up is also known as

“consolidation” or “aggregation.”
 It can be done by:
• Climbing up in the concept
hierarchy
• Reducing the dimensions
 In the cube given in the overview
section, the roll-up operation is
performed by climbing up in the
concept hierarchy of Location
dimension (City -> Country)
Dice
 Dice: It selects a sub-
cube from the OLAP cube
by selecting two or more
dimensions.
 The difference in dice is
you select 2 or more
dimensions that result in
the creation of a sub-
cube.
Slice:

 Slice: It selects a single

dimension from the OLAP
cube which results in a new
sub-cube creation. In the cube
given in the overview section,
Slice is performed on the
dimension Time = “Q1”.
Pivot

 Pivot: It is also known as

rotation operation as it rotates
the current view to get a new
view of the representation.
 In the sub-cube obtained after
the slice operation, performing
pivot operation gives a new
view of it.
Types of OLAP Systems in DBMS

 Multidimensional OLAP (MOLAP) – Cube-based – MOLAP is an

abbreviation for Multi-dimensional Online Analytical Processing. In
this type of analytical processing, multidimensional databases
(MDDBs) are used to store data. This data is later used for
analysis. MOLAP consists of data that is pre-computed and
fabricated. The data cubes from MDDBs carry data that has
already been calculated. This increases the speed of querying data.
 Relational OLAP (ROLAP) – Star Schema based –
 Hybrid OLAP (HOLAP) - HOLAP is a combination of ROLAP and
MOLAP.
 MOLAP is an abbreviation for
Multi-dimensional Online
Analytical Processing.
 In this type of analytical
processing, multidimensional
databases (MDDBs) are used
to store data.
 This data is later used for
analysis. MOLAP consists of
data that is pre-computed
and fabricated.
 The data cubes from MDDBs
carry data that has already
been calculated. This
increases the speed of
querying data.
MOLAP
 Advantages
• It performs well with operations such as slice and dice.
• Users can use it to perform complex calculations.
• It consists of pre-computed data that can be indexed
fast.
 Disadvantages
• It can only store a limited volume of data.
• The data used for analysis depends on certain
requirements that were set (previously). This limits
data analysis and navigation.
ROLAP

 ROLAP is an abbreviation for Relational Online

Analytical Processing.
 In this type of analytical processing, data
storage is done in a relational database.
 In this database, the arrangement of data is
made in rows and columns. Data is presented
to end-users in a multi-dimensional form.
 There are three main components in a
ROLAP model:
 Database server: This exists in the data
layer. This consists of data that is loaded
into the ROLAP server.
 ROLAP server: This consists of the ROLAP
engine that exists in the application layer.
 Front-end tool: This is the client desktop
that exists in the presentation layer.
 Let’s briefly look at how ROLAP works. When
a user makes a query (complex), the ROLAP
server will fetch data from the RDBMS
server. The ROLAP engine will then create
data cubes dynamically. The user will view
data from a multi-dimensional point.
ROLAP
 Advantages
 It can handle huge volumes of data.
 A ROLAP model can store data efficiently.
 ROLAP utilizes a relational database. This enables the model to
integrate the ROLAP server with an RDBMS (relational database
management system).
 Disadvantages
 There is slow performance, especially when the volume of data is
huge.
 ROLAP has certain limitations relating to SQL. For example, the SQL
feature has difficulties in handling complex calculations.
HOLAP
 This is an abbreviation for Hybrid Online Analytical
Processing. This type of analytical processing solves the
limitations of MOLAP and ROLAP and combines their
attributes.
 Data in the database is divided into two parts:
specialized storage and relational storage.
 Integrating these two aspects addresses issues relating
to performance and scalability.
 HOLAP stores huge volumes of data in a relational
database and keeps aggregations in a MOLAP server.
• The HOLAP model consists of
a server that can support
ROLAP and MOLAP.
• It consists of a complex
architecture that requires
frequent maintenance.
• Queries made in the HOLAP
model involve the multi-
dimensional database and
the relational database.
• The front-user tool presents
data from the database
management system (directly)
or through the intermediate
MOLAP.
HOLAP
 Advantages
 It improves performance and scalability because it combines multi-
dimensional and relational attributes of online analytical processing.
 It is a resourceful analytical processing tool if we expect the size of data to
increase.
 Its processing ability is higher than the other two analytical processing
tools.
 Disadvantages
 The model uses a huge storage space because it consists of data from two
databases.
 The model requires frequent updates because of its complex nature.
Basis of
Comparison MOLAP ROLAP HOLAP
Meaning Multi-Dimensional Online Relational Online Hybrid Online
Analytical Processing Analytical Processing Analytical Processing

Data Storage It stores data in a multi- It stores data in a It stores data in a

dimensional database. relational database. relational database
Technique It utilizes the Sparse Matrix It employs Structured It uses a combination
technique. Query Language (SQL). of SQL and Sparse
Matrix technique.

Volume of data It can process a limited It processes enormous It can process huge
volume of data. data. volumes of data.

Designed view The multi-dimensional view The multi-dimensional The multi-dimensional

is static. view is dynamic. view is dynamic.

Data It arranges data in data It arranges data in There is a multi-

arrangement cubes. rows and columns dimensional
(tables). arrangement of data

Oracle DBA Notes
78% (9)
Oracle DBA Notes
17 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
70 pages
UNIT - 1 - Datawarehouse & Data Mining
100% (1)
UNIT - 1 - Datawarehouse & Data Mining
24 pages
Data Lakehouse, Data Mesh, and Data Fabric - SqlBits
No ratings yet
Data Lakehouse, Data Mesh, and Data Fabric - SqlBits
35 pages
Big Data Architectures and The Data Lake: James Serra
No ratings yet
Big Data Architectures and The Data Lake: James Serra
53 pages
Database Datalake
No ratings yet
Database Datalake
2 pages
DL Vs DLH Draft v0.1
No ratings yet
DL Vs DLH Draft v0.1
9 pages
Data Warehousing & Dimensional Modeling Concepts !!
No ratings yet
Data Warehousing & Dimensional Modeling Concepts !!
33 pages
Data Lakes in A Modern Data Architecture
88% (8)
Data Lakes in A Modern Data Architecture
23 pages
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
50 pages
Unit 1
No ratings yet
Unit 1
54 pages
INFORMATION MANAGEMENT Unit 3 NEW
100% (1)
INFORMATION MANAGEMENT Unit 3 NEW
61 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
DW Midterms Notes
No ratings yet
DW Midterms Notes
48 pages
Chapter 1 - Introduction To Tally Prime Free
No ratings yet
Chapter 1 - Introduction To Tally Prime Free
9 pages
Unit 2 - Question Bank
No ratings yet
Unit 2 - Question Bank
18 pages
Chapter-7 Creation of Ledger
No ratings yet
Chapter-7 Creation of Ledger
22 pages
Data Warehouse OLAP OLTP
No ratings yet
Data Warehouse OLAP OLTP
12 pages
9 DMW Olap PPT 11.2
No ratings yet
9 DMW Olap PPT 11.2
12 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
51 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
73 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
31 pages
Interview Topics 1749449767
No ratings yet
Interview Topics 1749449767
5 pages
Data Warehousing and OLAP
No ratings yet
Data Warehousing and OLAP
47 pages
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
No ratings yet
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
54 pages
41 JDBC Java Activity 1
No ratings yet
41 JDBC Java Activity 1
4 pages
Human Resource Management-1
100% (1)
Human Resource Management-1
130 pages
Top Five Differences Between Data Lakes and Data Warehouses
No ratings yet
Top Five Differences Between Data Lakes and Data Warehouses
6 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
17 pages
Data Mining and Warehousing Lecture-1,2
No ratings yet
Data Mining and Warehousing Lecture-1,2
37 pages
Teradata Data Modleing Reference PDF
No ratings yet
Teradata Data Modleing Reference PDF
18 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
26 pages
02 DB Design
No ratings yet
02 DB Design
39 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
Difference Between Data Warehousing and Data Mining: Data Warehouse Architecture Three-Tier Data Warehouse Architecture
No ratings yet
Difference Between Data Warehousing and Data Mining: Data Warehouse Architecture Three-Tier Data Warehouse Architecture
10 pages
SQL Data Definition: Database Systems Lecture 5 Natasha Alechina
No ratings yet
SQL Data Definition: Database Systems Lecture 5 Natasha Alechina
26 pages
Warehouse Assignment MIM 106
No ratings yet
Warehouse Assignment MIM 106
8 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
What Is A Data Warehouse - IBM
No ratings yet
What Is A Data Warehouse - IBM
9 pages
Lect 5 Data Warehousing I - 240924 - 033406
No ratings yet
Lect 5 Data Warehousing I - 240924 - 033406
38 pages
Lec09-Data Warehousing
No ratings yet
Lec09-Data Warehousing
32 pages
Business Intelligence
No ratings yet
Business Intelligence
17 pages
Data Modeling Concept Latest
No ratings yet
Data Modeling Concept Latest
25 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Data Warehouse
No ratings yet
Data Warehouse
56 pages
Module 1
No ratings yet
Module 1
32 pages
Query The Service Contracts Tables For Header, Line, Subline and Billing Information
No ratings yet
Query The Service Contracts Tables For Header, Line, Subline and Billing Information
6 pages
Oracle 11g Virtual Columns
No ratings yet
Oracle 11g Virtual Columns
13 pages
DW Unit 1
No ratings yet
DW Unit 1
29 pages
Hospital Management System Database Design Is Uploaded in This Page
No ratings yet
Hospital Management System Database Design Is Uploaded in This Page
4 pages
Prepking Pgces-02 Exam Questions
No ratings yet
Prepking Pgces-02 Exam Questions
11 pages
Lec 11 - DW
No ratings yet
Lec 11 - DW
32 pages
Data Warehousing
No ratings yet
Data Warehousing
23 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
Data Warehouse References
No ratings yet
Data Warehouse References
40 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
Big Query
No ratings yet
Big Query
8 pages
Course Syllabus Course Text Books Reference Books What Is Data Warehouse ?
No ratings yet
Course Syllabus Course Text Books Reference Books What Is Data Warehouse ?
15 pages
Data Warehosing and Data Mining
No ratings yet
Data Warehosing and Data Mining
15 pages
WA Data Warehouse
No ratings yet
WA Data Warehouse
16 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
Raj Data Engineer
No ratings yet
Raj Data Engineer
7 pages
Unit 4 MCQ
No ratings yet
Unit 4 MCQ
48 pages
Tcs Syllabus
No ratings yet
Tcs Syllabus
7 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
Data Ware House
No ratings yet
Data Ware House
25 pages
FYIT Dbms March14
No ratings yet
FYIT Dbms March14
4 pages
What Is JDBC
No ratings yet
What Is JDBC
7 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Master Snowflake Interview Q A 1729835390
No ratings yet
Master Snowflake Interview Q A 1729835390
7 pages
Data Warehousing & Data Mining-A View
No ratings yet
Data Warehousing & Data Mining-A View
11 pages
Oracle Logminer
No ratings yet
Oracle Logminer
6 pages
LAB Manual: Course: CSC271: Database Systems
No ratings yet
LAB Manual: Course: CSC271: Database Systems
55 pages
Configure Postgresql in Ubuntu and Connect With Datagrip: Sheetal Kumar
No ratings yet
Configure Postgresql in Ubuntu and Connect With Datagrip: Sheetal Kumar
4 pages
Data Warehousing
No ratings yet
Data Warehousing
16 pages
DWDM202
No ratings yet
DWDM202
6 pages
7419 Paresh Motiwala DB Forensics For SQL Rally
No ratings yet
7419 Paresh Motiwala DB Forensics For SQL Rally
18 pages
Database Administration and Security Revised Notes Ver 3.0
No ratings yet
Database Administration and Security Revised Notes Ver 3.0
60 pages
4.1 Intro Nosql
No ratings yet
4.1 Intro Nosql
43 pages
Student MGT System (Cs Class 12)
No ratings yet
Student MGT System (Cs Class 12)
38 pages
PLSQL 7 4 Practice
No ratings yet
PLSQL 7 4 Practice
11 pages
DBMS Practice Questions
No ratings yet
DBMS Practice Questions
11 pages
Upgrad Campus - Business Analytics & Consulting With PWC India
No ratings yet
Upgrad Campus - Business Analytics & Consulting With PWC India
14 pages
Advance Java Programming
No ratings yet
Advance Java Programming
7 pages
Unit 4 RM Bba
No ratings yet
Unit 4 RM Bba
26 pages
OB
No ratings yet
OB
19 pages
DBMS - Lab - Manual (UGCA 1925) - BCA - Simranjit Kaur
No ratings yet
DBMS - Lab - Manual (UGCA 1925) - BCA - Simranjit Kaur
80 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
BIA Investment Banking Financial Analytics
No ratings yet
BIA Investment Banking Financial Analytics
22 pages
NoorLal 221370151
No ratings yet
NoorLal 221370151
3 pages
Database Management System
No ratings yet
Database Management System
4 pages
FAQ SAP HANA Indexes v124
No ratings yet
FAQ SAP HANA Indexes v124
13 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
Data Warehouse Final Report
No ratings yet
Data Warehouse Final Report
19 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet

Chapter 2 Data Warehousing

Uploaded by

Chapter 2 Data Warehousing

Uploaded by

Data

Prof Sadanand S Borse

 Thedata warehouse architecture comprises a three-tier

 Data Warehousing integrates data and information collected from

 Offline Operational Database:

Non-relational and relational from

Designed prior to the DW implementation (schema-on- Written at the time of analysis

Price/ Query results getting faster using

Any data that may or may not be

Data scientists, Data developers,

Machine Learning, Predictive

• it is required to understand business objectives clearly and

 The data preparation typically consumes about 90% of

• First, modeling techniques have to be selected to be

 In the evaluation phase, the model results must be

 OLAP (Online Analytical Processing) is a category of database

 To analyze and report on the health of a

 OLAP is a platform for all type of business includes

 Allows users to do slice and dice cube data all by

 Roll-up is also known as

 Slice: It selects a single

 Pivot: It is also known as

 Multidimensional OLAP (MOLAP) – Cube-based – MOLAP is an

 ROLAP is an abbreviation for Relational Online

Data Storage It stores data in a multi- It stores data in a It stores data in a

Designed view The multi-dimensional view The multi-dimensional The multi-dimensional

Data It arranges data in data It arranges data in There is a multi-

You might also like