0% found this document useful (0 votes)

29 views60 pages

Unit 1-1

Uploaded by

ansariajaruddin552

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views60 pages

Unit 1-1

Uploaded by

ansariajaruddin552

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 60

Unit-1

Data Warehousing

1
Points to discuss

■Data Warehouse: Basic Concepts

■Database vs Data Warehouse
■Data Warehouse Architecture/ Components
■Building a Data Warehouse
■Multi-Dimensional Data Models
■Data Warehouse Design
■Data Warehouse Usage
■Types of OLAP Servers
2
Data Warehouse: Basic
Concepts

3
What is a Data Warehouse?
■ Defined in many different ways, but not rigorously.
■A decision support database that is maintained
separately from the organization’s operational database
■ Support information processing by providing a solid
platform of consolidated, historical data for analysis.
■ “A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision-making process.”—W. H. Inmon
■ Data warehousing:
■ The process of constructing and using data warehouses

4
Data Warehouse—Subject-
Oriented

■ Organized around major subjects, such as

customer, product, sales
■ Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
transaction processing
■ Provide a simple and concise view around
particular subject issues by excluding data that
are not useful in the decision support process

5
Data Warehouse—Integrated

■ Constructed by integrating multiple,

heterogeneous data sources
■relational databases, flat files, on-line
transaction records
■ Data cleaning and data integration techniques
are applied.
■Ensure consistency in naming conventions,
encoding structures, attribute measures, etc.
among different data sources
■E.g., Hotel price: currency, tax, breakfast covered,
etc.
■When data is moved to the warehouse, it is
converted.
6
Data Warehouse—Time Variant

■ The time horizon for the data warehouse is

significantly longer than that of operational
systems
■Operational database: current value data
■Data warehouse data: provide information from
a historical perspective (e.g., past 5-10 years)
■ Every key structure in the data warehouse
■Contains an element of time, explicitly or
implicitly
■But the key of operational data may or may not
contain “time element”
7
Data Warehouse—Nonvolatile

■ A physically separate store of data transformed

from the operational environment
■ Operational update of data does not occur in the
data warehouse environment
■Does not require transaction processing,
recovery, and concurrency control mechanisms
■Requires only two operations in data accessing:
■initial loading of data and access of data

8
Database (OLTP) vs Data
Warehouse (OLAP)

9
OLTP (Database) vs. OLAP (Data
Warehouse)

10
Why a Separate Data Warehouse?
■ High performance for both systems
■ DBMS— tuned for OLTP: access methods, indexing,
concurrency control, recovery
■ Warehouse—tuned for OLAP: complex OLAP queries,
multidimensional view, consolidation

■ Different functions and different data:

■ missing data: Decision support requires historical data
which operational DBs do not typically maintain
■ data consolidation: DS requires consolidation
(aggregation, summarization) of data from heterogeneous
sources
■ data quality: different sources typically use inconsistent
data representations, codes and formats which have to be
reconciled 11
Data Warehouse
Components/ Architecture

12
Data Warehouse: A Multi-Tiered
Architecture

Monitor
Metadata & OLAP Server
Other
sources Integrator

Analysis
Extract
Operational Query
DBs Transform
Data Server
Reports
Load Warehouse Data mining
Refresh

Data
Marts
Data Sources Data Storage OLAP Engine Front-End Tools
13
Building a Data Warehouse

14
Extraction, Transformation, and
Loading (ETL)
■ Data extraction
■get data from multiple, heterogeneous, and
external sources
■ Data cleaning
■detect errors in the data and rectify them when
possible
■ Data transformation
■convert data from legacy or host format to
warehouse format
■ Load
■sort, summarize, consolidate, compute views,
check integrity, and build indicies and partitions
■ Refresh
■propagate the updates from the data sources to
the warehouse 15
Three Data Warehouse Models
■ Enterprise warehouse
■collects all of the information about subjects
spanning the entire organization
■ Data Mart
■a subset of corporate-wide data that is of value to
a specific groups of users. Its scope is confined
to specific, selected groups, such as marketing
data mart
■Independent vs. dependent (directly from warehouse)
data mart
■ Virtual warehouse
■A set of views over operational databases
■Only some of the possible summary views may16
Metadata Repository
■ Meta data is the data defining warehouse objects. It stores:
■ Description of the structure of the data warehouse
■ schema, view, dimensions, hierarchies, derived data defn,
data mart locations and contents
■ Operational meta-data
■ data lineage (history of migrated data and transformation
path), currency of data (active, archived, or purged),
monitoring information (warehouse usage statistics, error
reports, audit trails)
■ The algorithms used for summarization
■ The mapping from operational environment to the data
warehouse
■ Data related to system performance
■ warehouse schema, view and derived data definitions
■ Business data 17
Relational data base technology for data warehouse

■Linear Speed up: refers the ability to

increase the number of processor to reduce
response time
■Linear Scale up: refers the ability to provide
same performance on the same requests as
the database size increases.

Data Mining: Concepts and

* Techniques 18
Types of parallelism

■Inter query Parallelism: In which different server

threads or processes handle multiple requests at the
same time.
■Intra query Parallelism: This form of parallelism
decomposes the serial SQL query into lower level
operations such as scan, join, sort etc. Then these
lower level operations are executed concurrently in
parallel.

Data Mining: Concepts and

* Techniques 19
Two ways to do Intra query parallelism

■ Horizontal parallelism: which means that the data base is

partitioned across multiple disks and parallel processing occurs
within a specific task that is performed concurrently on
different processors against different set of data.

■ Vertical parallelism: This occurs among different tasks. All

query components such as scan, join, sort etc are executed in
parallel in a pipelined fashion. In other words, an output from
one task becomes an input into another task.

Data Mining: Concepts and

* Techniques 20
Horizontal parallelism
and Vertical parallelism

Data Mining: Concepts and

* Techniques 21
Data partitioning

■Data partitioning is the key component for effective

parallel execution of data base operations. There are
two ways do data partitioning:

■Random portioning
■Intelligent partitioning

Data Mining: Concepts and

* Techniques 22
Random portioning

■Includes random data striping across multiple disks on

a single server.

■Another option for random portioning is round robin

fashion partitioning in which each record is placed on
the next disk assigned to the data base.

Data Mining: Concepts and

* Techniques 23
Intelligent partitioning

■Assumes that DBMS knows where a specific record is

located and does not waste time searching for it across
all disks. The various intelligent partitioning include:
■Hash partitioning
■Key range partitioning
■Schema portioning
■User defined portioning

Data Mining: Concepts and

* Techniques 24
Hash partitioning

■A hash algorithm is used to calculate the partition

number based on the value of the partitioning key for
each row.

Data Mining: Concepts and

* Techniques 25
Key range partitioning

■Rows are placed and located in the partitions

according to the value of the partitioning key.
■That is all the rows with the key value from A to K are
in partition 1, L to T are in partition 2 and so on.

Data Mining: Concepts and

* Techniques 26
Schema portioning

■An entire table is placed on one disk; another table is

placed on different disk etc.
■This is useful for small reference tables.

Data Mining: Concepts and

* Techniques 27
User defined portioning

■It allows a table to be partitioned on the basis of a user

defined expression.

Data Mining: Concepts and

* Techniques 28
Data base architectures of parallel
processing
■There are three DBMS software architecture styles for
parallel processing:

■Shared Memory or shared everything Architecture

(SMA)
■Shared Disk architecture (SDA)
■Shred Nothing architecture (SNA)

Data Mining: Concepts and

* Techniques 29
Shared Memory Architecture (SMA)

Data Mining: Concepts and

* Techniques 30
Shared Memory Architecture (SMA)

■It is tightly coupled shared memory systems

■Characteristics:
■ Multiple PUs share memory
■ Each PU has full access to all shared memory through a common bus
■ Communication between nodes occurs via shared memory
■ Performance is limited by the bandwidth of the memory bus
■ It is simple to implement and provide a single system image,
implementing an RDBMS on SMP(symmetric multiprocessor)
■Disadvantage:
■ Scalability is limited by bus bandwidth and latency, and by available
memory

Data Mining: Concepts and

* Techniques 31
Shared Disk Architecture

Data Mining: Concepts and

* Techniques 32
Shared Disk Architecture (SDA)

■Shared disk systems are typically loosely coupled.

Such systems.
■Characteristics:
■ Each node consists of one or more PUs and associated memory
■ Memory is not shared between nodes
■ Communication occurs over a common high-speed bus
■ Each node has access to the same disks and other resources
■ A node can be an SMP if the hardware supports it
■ Bandwidth of the high-speed bus limits the number of nodes (scalability)
of the system
■ The Distributed Lock Manager (DLM ) is required

Data Mining: Concepts and

* Techniques 33
Advantages (SDA)

■Shared disk systems permit high availability.

■All data is accessible even if one node dies.
■These systems have the concept of one database,
which is an advantage over shared nothing systems.
■Shared disk systems provide for incremental growth.

Data Mining: Concepts and

* Techniques 34
Disadvantages (SDA)

■Inter-node synchronization is required, involving

DLM overhead and greater dependency on high-speed
interconnect.
■If the workload is not partitioned well, there may be
high synchronization overhead.

Data Mining: Concepts and

* Techniques 35
Shared Nothing Architecture (SNA)

Data Mining: Concepts and

* Techniques 36
Shared Nothing Architecture (SNA)

■Shared nothing systems are typically loosely coupled.

■In shared nothing systems, only one CPU is connected
to a given disk, if a table or database is located on that
disk.
■Shared nothing systems are concerned with access to
disks, not access to memory
■Adding more PUs and disks can improve scale up

Data Mining: Concepts and

* Techniques 37
Advantages & disadvantages of
SNA
■Advantages:
■Shared nothing systems provide for incremental growth.
■System growth is practically unlimited.
■Massive Parallel Processing (MPP) systems are good for read-
only databases and decision support applications.
■Failure is local: if one node fails, the others stay up.
■ Disadvantages:
■More coordination is required.
■More overhead is required for a process working on a disk
belonging to another node.
■If there is a heavy workload of updates or inserts, as in an online
transaction processing system, it may be worthwhile to consider
data-dependent routing to alleviate contention.
Data Mining: Concepts and
* Techniques 38
Multi-Dimensional Data
Models

39
Multidimensional Data

■Sales volume as a function of product,

month, and region
Dimensions: Product, Location,
Time
n

Hierarchical summarization paths

o
gi

Industry Region Year

Category Country Quarter

Produc

Product City Month Week

Office Day

Mont
h 40
From Tables and Spreadsheets to
Data Cubes
■ A data warehouse is based on a multidimensional data
model which views data in the form of a data cube
■ A data cube, such as sales, allows data to be modeled and
viewed in multiple dimensions
■ Dimension tables, such as item (item_name, brand,
type), or time(day, week, month, quarter, year)
■ Fact table contains measures (such as dollars_sold)
and keys to each of the related dimension tables
■ In data warehousing literature, an n-D base cube is called a
base cuboid. The top most 0-D cuboid, which holds the
highest-level of summarization, is called the apex cuboid.
The lattice of cuboids forms a data cube.
41
Cube: A Lattice of Cuboids

all
0-D (apex)
cuboid
tim ite locatio supplie
e m n r 1-D
cuboids
time,location item,location location,supplier
time,item 2-D
time,supplier item,supplier cuboids
time,location,supplie
r 3-D
time,item,location cuboids
time,item,supplie item,location,supplier
r
4-D (base)
time, item, location, supplier cuboid

42
Conceptual Modeling of Data
Warehouses
■ Modeling data warehouses: dimensions &
measures
■Star schema: A fact table in the middle
connected to a set of dimension tables
■Snowflake schema: A refinement of star
schema where some dimensional hierarchy is
normalized into a set of smaller dimension
tables, forming a shape similar to snowflake
■Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of
stars, therefore called galaxy schema or fact
43
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch branch_key location
branch_key location_key
location_key street
branch_name
branch_type units_sold city
state_or_province
dollars_sold country

Measures avg_sales

44
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year supplier_key
item_key
branch branch_key location
location_key
branch_key location_key street
branch_name
city_key
branch_type units_sold city
dollars_sold city_key
city
Measures avg_sales state_or_province
country

45
Example of Fact
Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key
from_location
branch location
branch_key location_key to_location
location_key
branch_name street dollars_cost
branch_type units_sold
city
dollars_sold province_or_state units_shipped
country shipper
Measures avg_sales
shipper_key
shipper_name
location_key
shipper_type 46
A Concept Hierarchy:
Dimension (location)

al al
l l

regio Europe ... North_Americ

n a

countr German ... Spain Canad ... Mexic

y y a o

cit Frankfur ... Vancouver ... Toronto

y t

office L. Chan ... M. Wind

47
Data Cube Measures: Three
Categories

■ Distributive: if the result derived by applying the

function to n aggregate values is the same as that
derived by applying the function on all the data
without partitioning
■E.g., count(), sum(), min(), max()
■ Algebraic: if it can be computed by an algebraic
function with M arguments (where M is a bounded
integer), each of which is obtained by applying a
distributive aggregate function
■E.g., avg(), min_N(), standard_deviation()
■ Holistic: if there is no constant bound on the
storage size needed to describe a subaggregate.
■E.g., median(), mode(), rank() 48
A Sample Data Cube

Total annual sales

Date of TVs in U.S.A.
1Qt 2Qt 3Qt 4Qt su
t
uc

TV r
r r r m
od

PC U.S.
Pr

VCR A

Country
su
Canad
m
a
Mexic
o
su
m

49
Cuboids Corresponding to the Cube

al
l 0-D (apex)
product countr cuboid
date
y 1-D
cuboids
product,dat product,countr date,
e y country 2-D
cuboids

3-D (base)
product, date, cuboid
country

50
Typical OLAP Operations
■ Roll up (drill-up): summarize data
■by climbing up hierarchy or by dimension
reduction
■ Drill down (roll down): reverse of roll-up
■from higher level summary to lower level
summary or detailed data, or introducing new
dimensions
■ Slice and dice: project and select
■ Pivot (rotate):
■reorient the cube, visualization, 3D to series of 2D
planes
■ Other operations
■drill across: involving (across) more than one fact
table 51
Data Warehouse Design

52
Design of Data Warehouse: A
Business Analysis Framework
■ Four views regarding the design of a data
warehouse
■Top-down view
■allows selection of the relevant information necessary
for the data warehouse
■Data source view
■exposes the information being captured, stored, and
managed by operational systems
■Data warehouse view
■consists of fact tables and dimension tables
■Business query view
■sees the perspectives of data in the warehouse from
the view of end-user
53
Data Warehouse Design
Process
■ Top-down, bottom-up approaches or a combination of
both
■ Top-down: Starts with overall design and planning
(mature)
■ Bottom-up: Starts with experiments and prototypes (rapid)
■ From software engineering point of view
■ Waterfall: structured and systematic analysis at each step
before proceeding to the next
■ Spiral: rapid generation of increasingly functional
systems, short turn around time, quick turn around
■ Typical data warehouse design process
■ Choose a business process to model, e.g., orders, invoices,
etc.
■ Choose the grain (atomic level of data) of the business
process 54
Data Warehouse
Development: A
Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts

Data Data Enterprise

Mart Mart Data
Warehouse

Model refinement Model refinement

Define a high-level corporate data model

55
Data Warehouse Usage

56
Data Warehouse Usage
■ Three kinds of data warehouse applications
■ Information processing
■supports querying, basic statistical analysis, and
reporting using crosstabs, tables, charts and graphs
■ Analytical processing
■multidimensional analysis of data warehouse data
■supports basic OLAP operations, slice-dice, drilling,
pivoting
■ Data mining
■knowledge discovery from hidden patterns
■supports associations, constructing analytical models,
performing classification and prediction, and
presenting the mining results using visualization tools
57
Types of OLAP Servers

58
OLAP Server Architectures

■ Relational OLAP (ROLAP)

■ Use relational or extended-relational DBMS to store and
manage warehouse data and OLAP middle ware
■ Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and
services
■ Greater scalability
■ Multidimensional OLAP (MOLAP)
■ Sparse array-based multidimensional storage engine
■ Fast indexing to pre-computed summarized data
■ Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)
■ Flexibility, e.g., low level: relational, high-level: array
■ Specialized SQL servers (e.g., Redbricks)
■ Specialized support for SQL queries over star/snowflake
59
Concept Description vs. Cube-
Based OLAP
■ Similarity:
■Data generalization
■Presentation of data summarization at multiple
levels of abstraction
■Interactive drilling, pivoting, slicing and dicing
■ Differences:
■OLAP has systematic preprocessing, query
independent, and can drill down to rather low
level
■AOI has automated desired level allocation, and
may perform dimension relevance
analysis/ranking when there are many relevant
dimensions
■AOI works on the data which are not in relational 60

Data Warehousing
100% (1)
Data Warehousing
51 pages
Csb4318 DWDM Unit - 1 Revised
No ratings yet
Csb4318 DWDM Unit - 1 Revised
68 pages
DWDM Combine
No ratings yet
DWDM Combine
97 pages
Pre 6 Finals
No ratings yet
Pre 6 Finals
9 pages
DW Intro
No ratings yet
DW Intro
30 pages
Data Ware Housing & Data Mining
No ratings yet
Data Ware Housing & Data Mining
158 pages
Erp Full MCQ
100% (1)
Erp Full MCQ
44 pages
9 DMW Olap PPT 11.2
No ratings yet
9 DMW Olap PPT 11.2
12 pages
Data Warehousing and Mining Unit 1
No ratings yet
Data Warehousing and Mining Unit 1
15 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
Business Intelligence and Data Warehousing
No ratings yet
Business Intelligence and Data Warehousing
117 pages
Data Ware Housing and Olap Technology
No ratings yet
Data Ware Housing and Olap Technology
27 pages
Module 1 DMDW
No ratings yet
Module 1 DMDW
64 pages
Unit 1 - Data Warehouse
No ratings yet
Unit 1 - Data Warehouse
21 pages
UNIT - 1 - Datawarehouse & Data Mining
100% (1)
UNIT - 1 - Datawarehouse & Data Mining
24 pages
Data Warehousing Introduction Pages 2 53
No ratings yet
Data Warehousing Introduction Pages 2 53
52 pages
2025-Handouts - OLAP - Lecture 1
No ratings yet
2025-Handouts - OLAP - Lecture 1
10 pages
Unit 1
No ratings yet
Unit 1
54 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
4-Data Warehousing and Integration in Business
No ratings yet
4-Data Warehousing and Integration in Business
39 pages
Week 02 Part 01
No ratings yet
Week 02 Part 01
15 pages
Warehouse
No ratings yet
Warehouse
60 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
1.1 Basic Concepts & Architecture
No ratings yet
1.1 Basic Concepts & Architecture
27 pages
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
No ratings yet
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
651 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
86 pages
CH 03
No ratings yet
CH 03
27 pages
Data Warehouse OLAP OLTP
No ratings yet
Data Warehouse OLAP OLTP
12 pages
Unit 2 Data Warehousing and OLAP
No ratings yet
Unit 2 Data Warehousing and OLAP
72 pages
BI Unit 1 Data Warehouse
No ratings yet
BI Unit 1 Data Warehouse
169 pages
Unit 1
No ratings yet
Unit 1
60 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
Lect 5 Data Warehousing I - 240924 - 033406
No ratings yet
Lect 5 Data Warehousing I - 240924 - 033406
38 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
73 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
OLAP and Data Mining
No ratings yet
OLAP and Data Mining
27 pages
Big Query
No ratings yet
Big Query
8 pages
INFORMATION MANAGEMENT Unit 3 NEW
100% (1)
INFORMATION MANAGEMENT Unit 3 NEW
61 pages
04OLAP
No ratings yet
04OLAP
50 pages
Unit-I DW - Architecture
100% (1)
Unit-I DW - Architecture
96 pages
Business Intelligence - Data Warehouse Implementation
100% (1)
Business Intelligence - Data Warehouse Implementation
157 pages
CH 1
No ratings yet
CH 1
53 pages
L01-Introduction To Data Warehouse and Business Intelligence
No ratings yet
L01-Introduction To Data Warehouse and Business Intelligence
42 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
DWDM
No ratings yet
DWDM
15 pages
Data Warehousing/Mining Comp 150 Data Warehousing Introduction (Not in Book)
No ratings yet
Data Warehousing/Mining Comp 150 Data Warehousing Introduction (Not in Book)
35 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
Epicor EPM Canvas Planning Course 10.0.700
0% (1)
Epicor EPM Canvas Planning Course 10.0.700
74 pages
Data Warehousing/Mining Comp 150 Data Warehousing Introduction (Not in Book)
No ratings yet
Data Warehousing/Mining Comp 150 Data Warehousing Introduction (Not in Book)
35 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
70 pages
Data Warehouse
No ratings yet
Data Warehouse
77 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
What Is A Data Warehouse?: Data Warehouse Architecture From Data Warehousing To Data Mining
No ratings yet
What Is A Data Warehouse?: Data Warehouse Architecture From Data Warehousing To Data Mining
27 pages
Business Intelligence/ Data Warehousing: Lakshmi Prashad PMG
100% (1)
Business Intelligence/ Data Warehousing: Lakshmi Prashad PMG
101 pages
Designing The Data Warehouse Aima Second Lecture
No ratings yet
Designing The Data Warehouse Aima Second Lecture
34 pages
Informatica Powercenter Course
No ratings yet
Informatica Powercenter Course
8 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
Data Warehouse and OLAP
No ratings yet
Data Warehouse and OLAP
55 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
How Evolution of Database Led To Data Mining
No ratings yet
How Evolution of Database Led To Data Mining
10 pages
Comprehensive Guide To Business Analytics
No ratings yet
Comprehensive Guide To Business Analytics
10 pages
5.data Warehouse
No ratings yet
5.data Warehouse
19 pages
ISM Module 4
No ratings yet
ISM Module 4
133 pages
A Practitioners Guide To Databricks Vs Snowflake
No ratings yet
A Practitioners Guide To Databricks Vs Snowflake
8 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Guia Fdi
No ratings yet
Guia Fdi
31 pages
CSIT Seventh Semester DBMS Old Question Answer
50% (2)
CSIT Seventh Semester DBMS Old Question Answer
42 pages
MIT-101 Introduction To Information Technology
No ratings yet
MIT-101 Introduction To Information Technology
26 pages
Data Warehousing Summary SET A
No ratings yet
Data Warehousing Summary SET A
27 pages
(DATA SCIENCE Syllabus
No ratings yet
(DATA SCIENCE Syllabus
2 pages
Sap Bi BW Bo Syllabus
No ratings yet
Sap Bi BW Bo Syllabus
13 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Data Analytics Modelling Bcom Ba III Sem 2022-23
No ratings yet
Data Analytics Modelling Bcom Ba III Sem 2022-23
32 pages
Books Zone: Presented By: Akanksha Nanen Toppo MBA/10043/18
No ratings yet
Books Zone: Presented By: Akanksha Nanen Toppo MBA/10043/18
28 pages
Database Management System
No ratings yet
Database Management System
32 pages
DSBDA - Unit - 1
No ratings yet
DSBDA - Unit - 1
41 pages
Dam Unit - Iv
No ratings yet
Dam Unit - Iv
17 pages
Rajkumar Shanmugam A
No ratings yet
Rajkumar Shanmugam A
5 pages
Data Warehouse - BSA 1st Year For BCA
No ratings yet
Data Warehouse - BSA 1st Year For BCA
20 pages
Sample Questions
No ratings yet
Sample Questions
51 pages
Summary Chapter 6 Foundations of Business Intelligence: Databases and Information Management
50% (2)
Summary Chapter 6 Foundations of Business Intelligence: Databases and Information Management
2 pages
Database and Data Warehouse
No ratings yet
Database and Data Warehouse
7 pages
Amazon Red Shift
No ratings yet
Amazon Red Shift
17 pages
Data Warehouse Design Patterns: Steven F. Lott, Consultant
No ratings yet
Data Warehouse Design Patterns: Steven F. Lott, Consultant
10 pages
Oracle Database 11g: Data Warehousing Fundamentals: Duración
No ratings yet
Oracle Database 11g: Data Warehousing Fundamentals: Duración
4 pages

Unit 1-1

Uploaded by

Unit 1-1

Uploaded by

Unit-1

■Data Warehouse: Basic Concepts

■ Organized around major subjects, such as

■ Constructed by integrating multiple,

■ The time horizon for the data warehouse is

■ A physically separate store of data transformed

■ Different functions and different data:

■Linear Speed up: refers the ability to

Data Mining: Concepts and

■Inter query Parallelism: In which different server

Data Mining: Concepts and

■ Horizontal parallelism: which means that the data base is

■ Vertical parallelism: This occurs among different tasks. All

Data Mining: Concepts and

Data Mining: Concepts and

■Data partitioning is the key component for effective

Data Mining: Concepts and

■Includes random data striping across multiple disks on

■Another option for random portioning is round robin

Data Mining: Concepts and

■Assumes that DBMS knows where a specific record is

Data Mining: Concepts and

■A hash algorithm is used to calculate the partition

Data Mining: Concepts and

■Rows are placed and located in the partitions

Data Mining: Concepts and

■An entire table is placed on one disk; another table is

Data Mining: Concepts and

■It allows a table to be partitioned on the basis of a user

Data Mining: Concepts and

■Shared Memory or shared everything Architecture

Data Mining: Concepts and

Data Mining: Concepts and

■It is tightly coupled shared memory systems

Data Mining: Concepts and

Data Mining: Concepts and

■Shared disk systems are typically loosely coupled.

Data Mining: Concepts and

■Shared disk systems permit high availability.

Data Mining: Concepts and

■Inter-node synchronization is required, involving

Data Mining: Concepts and

Data Mining: Concepts and

■Shared nothing systems are typically loosely coupled.

Data Mining: Concepts and

■Sales volume as a function of product,

Hierarchical summarization paths

Industry Region Year

Category Country Quarter

Product City Month Week

regio Europe ... North_Americ

countr German ... Spain Canad ... Mexic

cit Frankfur ... Vancouver ... Toronto

office L. Chan ... M. Wind

■ Distributive: if the result derived by applying the

Total annual sales

Data Data Enterprise

Model refinement Model refinement

Define a high-level corporate data model

■ Relational OLAP (ROLAP)

You might also like