0% found this document useful (0 votes)
22 views36 pages

Unit - 4

data warehousing unit 4 2021R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views36 pages

Unit - 4

data warehousing unit 4 2021R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Paavai Institutions Department of CSE

UNIT - IV

DIMENSIONAL MODELING
AND
SCHEMA

UNIT-IV 4. 1
Paavai Institutions Department of CSE

CONTENTS
4.1 D I M E N S I O N A L M O D E L I N G
4.2 M U L T I - D I M E N S I O N A L D A T A M O D E L I N G
4.3 DATA CUBE
4.4 STAR SCHEMA
4.5 SNOWFLAKE SCHEMA
4.6 SRAR VS SNOWFLAKE SCHEMA
4.7 FACT CONSTELLATION SCHEMA
4.8 SCHEMA DEFINITION
4.8.1 STAR SCHEMA
4.8.2 SNOWFLAKE SCHEMA
4.8.3 GALAXY SCHEMA
4.9 PROCESS ARCHITECTURE

4.10 TYPES OF DATA BASE PARALLELISM

4.11 DATAWAREHOUSE TOOLS

QUESTION BANK

UNIT-IV 4. 2
Paavai Institutions Department of CSE

TECHNICAL TERMS

S.No Literal Digester


Term Technical Meaning
Meaning
Creating a conceptual
representation of data
1 Data Visual objects and their http://www.yourdictionary.com/
Modeling representation relationships to one
another.

Conceptual The logical description


2 Schema http://www.yourdictionary.com/
framework of the entire database.

quantitative The table that contains


3 Fact Table information all the facts or the http://www.yourdictionary.com/
business information.

A database table that


Dimension A realm of stores attributes
4 http://whatis.techtarget.com
existence describing the facts in a
Table
fact table.

A data structure optimized


Multidimensional for fast and efficient http://www.yourdictionary.com/
5 Data Cube
representation analysis.

Measure of
A logical structure of data http://www.yourdictionary.com/
6 Fact online analytical
Constellation warehouse or data mart.
processing

It aggregates a very large


7 Database Parallelization partitioned database table and
http://whatis.techtarget.com/
across multiple stores the result in a result-
Parallelism processors table.

A central repository of
Holds highly http://www.yourdictionary.com/
8 Data information that can be
structured data
Warehouse analyzed to make more
informed decisions.

UNIT-IV 4. 3
Paavai Institutions Department of CSE

4.1 D I M E N S I O N A L M O D E L I N G

Dimensional modeling represents data with a cube operation, making more suitable logical
data representation with OLAP data management. The perception of Dimensional Modeling was developed
by Ralph Kimball and is consist of "fact" and "dimension" tables.

The purposes of dimensional modeling are:

1. To produce database architecture that is easy for end-clients to understand and write queries.
2. To maximize the efficiency of queries. It achieves these goals by minimizing the number of tables and
relationships between them.

Elements of Dimensional Modeling

Fact

It is a collection of associated data items, consisting of measures and context data. It typically represents
business items or business transactions.

Dimensions

It is a collection of data which describe one business dimension. Dimensions decide the contextual
background for the facts, and they are the framework over which OLAP is performed.

Measure

It is a numeric attribute of a fact, representing the performance or behavior of the business relative to the
dimensions.

Considering the relational context, there are two basic models which are used in dimensional modeling:

o Star Model
o Snowflake Model

 The star model is the underlying structure for a dimensional model. It has one broad central table
(fact table) and a set of smaller tables (dimensions) arranged in a radial design around the primary
table.

UNIT-IV 4. 4
Paavai Institutions Department of CSE

 The snowflake model is the conclusion of decomposing one or more of the dimensions.

Fact Table

Fact tables are used to data facts or measures in the business. Facts are the numeric data elements that
are of interest to the company.

Characteristics of the Fact table

 The fact table includes numerical values of what we measure. For example, a fact value of 20 might
means that 20 widgets have been sold.

 Each fact table includes the keys to associated dimension tables. These are known as foreign keys in
the fact table.

 Fact tables typically include a small number of columns.

 When it is compared to dimension tables, fact tables have a large number of rows.

Dimension Table

Dimension tables establish the context of the facts. Dimensional tables store fields that describe
the facts.

Characteristics of the Dimension table

 Dimension tables contain the details about the facts. That, as an example, enables the business analysts
to understand the data and their reports better.

 The dimension tables include descriptive data about the numerical values in the fact table. That is, they
contain the attributes of the facts. For example, the dimension tables for a marketing analysis function
might include attributes such as time, marketing region, and product type.

 Since the record in a dimension table is demoralized, it usually has a large number of columns. The
dimension tables include significantly fewer rows of information than the fact table.

 The attributes in a dimension table are used as row and column headings in a document or query
results display.

UNIT-IV 4. 5
Paavai Institutions Department of CSE

Example: A city and state can view a store summary in a fact table. Item summary can be viewed by brand,
color, etc. Customer information can be viewed by name and address.

Advantages of Dimensional Modeling

 Dimensional modeling is simple: Dimensional modeling methods make it possible for warehouse
designers to create database schemas that business customers can easily hold and comprehend.

 Dimensional modeling promotes data quality: The star schema enable warehouse administrators to
enforce referential integrity checks on the data warehouse.
UNIT-IV 4. 6
Paavai Institutions Department of CSE

 By enforcing foreign key constraints as a form of referential integrity check, data warehouse DBAs
add a line of defense against corrupted warehouses data.

 Performance optimization is possible through aggregates: As the size of the data warehouse increases,
performance optimization develops into a pressing concern

Disadvantages of Dimensional Modeling


 To maintain the integrity of fact and dimensions, loading the data warehouses with a record from
various operational systems is complicated.

 It is severe to modify the data warehouse operation if the organization adopting the dimensional
technique changes the method in which it does business.

4.2 M U L T I - D I M E N S I O N A L D A T A M O D E L I N G

A multidimensional model views data in the form of a data-cube. A data cube enables data to be
modeled and viewed in multiple dimensions. It is defined by dimensions and facts.

The dimensions are the perspectives or entities concerning which an organization keeps records. For
example, a shop may create a sales data warehouse to keep records of the store's sales for the dimension time,
item, and location. Each dimension has a table related to it, called a dimensional table, which describes the
dimension further. For example, a dimensional table for an item may contain the attributes item name, brand,
and type.

UNIT-IV 4. 7
Paavai Institutions Department of CSE

Consider the data of a shop for items sold per quarter in the city of Delhi. The data is shown in the table. In
this 2D representation, the sales for Delhi are shown for the time dimension (organized in quarters) and the
item dimension (classified according to the types of an item sold). The fact or measure displayed in rupee sold
(in thousands).

UNIT-IV 4. 8
Paavai Institutions Department of CSE

Now, if we want to view the sales data with a third dimension, For example, suppose the data according to
time and item, as well as the location is considered for the cities Chennai, Kolkata, Mumbai, and Delhi. These
3D data are shown in the table. The 3D data of the table are represented as a series of 2D tables.

Features of multidimensional data models:

Measures: Measures are numerical data that can be analyzed and compared, such as sales or
revenue. They are typically stored in fact tables in a multidimensional data model.
Dimensions: Dimensions are attributes that describe the measures, such as time, location, or
product. They are typically stored in dimension tables in a multidimensional data model.
Cubes: Cubes are structures that represent the multidimensional relationships between measures and
dimensions in a data model. They provide a fast and efficient way to retrieve and analyze data.
Aggregation: Aggregation is the process of summarizing data across dimensions and levels of
detail. This is a key feature of multidimensional data models, as it enables users to quickly analyze
data at different levels of granularity.
Drill-down and roll-up: Drill-down is the process of moving from a higher-level summary of data
to a lower level of detail, while roll-up is the opposite process of moving from a lower-level detail to
a higher-level summary. These features enable users to explore data in greater detail and gain
insights into the underlying patterns.
Hierarchies: Hierarchies are a way of organizing dimensions into levels of detail. For example, a
time dimension might be organized into years, quarters, months, and days. Hierarchies provide a
way to navigate the data and perform drill-down and roll-up operations.
UNIT-IV 4. 9
Paavai Institutions Department of CSE

OLAP (Online Analytical Processing): OLAP is a type of multidimensional data model that
supports fast and efficient querying of large datasets. OLAP systems are designed to handle complex
queries and provide fast response times.

Advantages of Multi Dimensional Data Model

 A multi-dimensional data model is easy to handle.


 It is easy to maintain.
 Its performance is better than that of normal databases (e.g. relational databases).
 The representation of data is better than traditional databases. That is because the multi-dimensional
databases are multi-viewed and carry different types of factors.
 It is workable on complex systems and applications, contrary to the simple one-dimensional database
systems.
 The compatibility in this type of database is an upliftment for projects having lower bandwidth for
maintenance staff.

Disadvantages of Multi Dimensional Data Model

 The multi-dimensional Data Model is slightly complicated in nature and it requires professionals to
recognize and examine the data in the database.
 During the work of a Multi-Dimensional Data Model, when the system caches, there is a great effect on
the working of the system.
 It is complicated in nature due to which the databases are generally dynamic in design.
 The path to achieving the end product is complicated most of the time.
 As the Multi Dimensional Data Model has complicated systems, databases have a large number of
databases due to which the system is very insecure when there is a security break.

4.3 DATA CUBE


Grouping of data in a multidimensional matrix is called data cubes. In Data ware housing, we
generally deal with various multidimensional data models as the data will be represented by multiple
dimensions and multiple attributes. This multidimensional data is represented in the data cube as the cube
represents a high-dimensional space. The Data cube pictorially shows how different attributes of data are
arranged in the data model. Below is the diagram of a general data cube.

UNIT-IV 4. 10
Paavai Institutions Department of CSE

Data cube classification:

The data cube can be classified into two categories:

 Multidimensional data cube: It basically helps in storing large amounts of data by making use of a multi-
dimensional array. It increases its efficiency by keeping an index of each dimension. Thus, dimensional
is able to retrieve data fast.

 Relational data cube: It basically helps in storing large amounts of data by making use of relational
tables. Each relational table displays the dimensions of the data cube. It is slower compared to a
Multidimensional Data Cube.

Data cube operations:

Data cube operations are used to manipulate data to meet the needs of users. These operations help
to select particular data for the analysis purpose. Data cube operations are used to manipulate data to meet
the needs of users. These operations help to select particular data for the analysis purpose.

UNIT-IV 4. 11
Paavai Institutions Department of CSE

There are mainly 5 operations listed below-

 Roll-up: operation and aggregate certain similar data attributes having the same dimension together.
 Drill-down: this operation is the reverse of the roll-up operation. It allows us to take particular
information and then subdivide it further for coarser granularity analysis.
 Slicing: this operation filters the unnecessary portions. Suppose in a particular dimension, the user
doesn’t need everything for analysis, rather a particular attribute.
 Dicing: this operation does a multidimensional cutting that not only cuts only one dimension but also can
go to another dimension and cut a certain range of it.
 Multi-dimensional analysis: Data cubes enable multi-dimensional analysis of business data, allowing
users to view data from different perspectives and levels of detail.
 Interactivity: Data cubes provide interactive access to large amounts of data, allowing users to easily
navigate and manipulate the data to support their analysis.

Advantages of data cube:

 Speed and efficiency: Data cubes are optimized for OLAP analysis, enabling fast and efficient querying
and aggregation of data.

UNIT-IV 4. 12
Paavai Institutions Department of CSE

 Data aggregation: Data cubes support complex calculations and data aggregation, enabling users to
quickly and easily summarize large amounts of data.
 Helps in giving a summarised view of data.
 Data cubes store large data in a simple way.
 Data cube operation provides quick and better analysis,
 Improve performance of data.

Disadvantages of data cube:

 Complexity: OLAP systems can be complex to set up and maintain, requiring specialized technical
expertise.
 Data size limitations: OLAP systems can struggle with very large data sets and may require extensive
data aggregation or summarization.
 Performance issues: OLAP systems can be slow when dealing with large amounts of data, especially
when running complex queries or calculations.
 Cost: OLAP technology can be expensive, especially for enterprise-level solutions, due to the need for
specialized hardware and software.
4.4 STAR SCHEMA
A star schema is a type of data modeling technique used in data warehousing to represent data in a
structured and intuitive way. In a star schema, data is organized into a central fact table that contains the
measures of interest, surrounded by dimension tables that describe the attributes of the measures.

The fact table in a star schema contains the measures or metrics that are of interest to the user or
organization. For example, in a sales data warehouse, the fact table might contain sales revenue, units sold,
and profit margins.

Each record in the fact table represents a specific event or transaction, such as a sale or order.

The dimension tables in a star schema contain the descriptive attributes of the measures in the fact
table. These attributes are used to slice and dice the data in the fact table, allowing users to analyze the data
from different perspectives.

For example, in a sales data warehouse, the dimension tables might include product, customer, time,
and location.

Below is an example to demonstrate the Star Schema:

UNIT-IV 4. 13
Paavai Institutions Department of CSE

In the above demonstration, SALES is a fact table having attributes i.e. (Product ID, Order ID,
Customer ID, Employer ID, Total, Quantity, Discount) which references to the dimension tables.
Employee dimension table contains the attributes: Emp ID, Emp Name, Title, Department and
Region. Product dimension table contains the attributes: Product ID, Product Name, Product Category, Unit
Price.
Customer dimension table contains the attributes: Customer ID, Customer Name, Address, City,
Zip. Time dimension table contains the attributes: Order ID, Order Date, Year, Quarter, Month.
Advantages of Star Schema :
1. Simpler Queries –
Join logic of star schema is quite cinch in comparison to other join logic which are needed to fetch data
from a transactional schema that is highly normalized.
2. Simplified Business Reporting Logic –
In comparison to a transactional schema that is highly normalized, the star schema makes simpler
common business reporting logic, such as of reporting and period-over-period.
3. Feeding Cubes –
Star schema is widely used by all OLAP systems to design OLAP cubes efficiently. In fact, major OLAP
systems deliver a ROLAP mode of operation which can use a star schema as a source without designing
a cube structure.
UNIT-IV 4. 14
Paavai Institutions Department of CSE

Disadvantages of Star Schema :


1. Data integrity is not enforced well since in a highly de-normalized schema state.
2. Not flexible in terms if analytical needs as a normalized data model.
3. Star schemas don’t reinforce many-to-many relationships within business entities – at least not
frequently.

4.5 SNOWFLAKE SCHEMA


The snowflake schema is a variant of the star schema. Here, the centralized fact table is connected to
multiple dimensions.
In the snowflake schema, dimensions are present in a normalized form in multiple related tables.
The snowflake structure materialized when the dimensions of a star schema are detailed and highly
structured, having several levels of relationship, and the child tables have multiple parent tables.
The snowflake effect affects only the dimension tables and does not affect the fact tables.
A snowflake schema is a type of data modeling technique used in data warehousing to represent data
in a structured way that is optimized for querying large amounts of data efficiently.
In a snowflake schema, the dimension tables are normalized into multiple related tables, creating a
hierarchical or “snowflake” structure .Here the fact table is still located at the center of the schema,
surrounded by the dimension tables. However, each dimension table is further broken down into multiple
related tables, creating a hierarchical structure that resembles a snowflake.

For Example, in a sales data warehouse, the product dimension table might be normalized into
multiple related tables, such as product category, product subcategory, and product details. Each of these
tables would be related to the product dimension table through a foreign key relationship.

Example:

UNIT-IV 4. 15
Paavai Institutions Department of CSE

The Employee dimension table now contains the attributes: EmployeeID, EmployeeName,
DepartmentID, Region, and Territory. The DepartmentID attribute links with the Employee table with
the Department dimension table. The Department dimension is used to provide detail about each
department, such as the Name and Location of the department. The Customer dimension table now
contains the attributes: CustomerID, CustomerName, Address, and CityID. The CityID attributes link
the Customer dimension table with the City dimension table. The City dimension table has details about
each city such as city name, Zipcode, State, and Country.

4.6 STAR SCHEMA VS SNOWFLAKE SCHEMA

 Star Schema: Star schema is the type of multidimensional model which is used for data
warehouse. In star schema, The fact tables and the dimension tables are contained.

 In this schema fewer foreign-key join is used. This schema forms a star with fact table and
dimension tables.

UNIT-IV 4. 16
Paavai Institutions Department of CSE

 In snowflake schema, The fact tables, dimension tables as well as sub dimension tables are
contained. This schema forms a snowflake with fact tables, dimension tables as well as sub-
dimension tables.

UNIT-IV 4. 17
Paavai Institutions Department of CSE

S.N
O Star Schema Snowflake Schema

In star schema, The fact tables and the While in snowflake schema, The fact tables, dimension
1.
dimension tables are contained. tables as well as sub dimension tables are contained.

2. Star schema is a top-down model. While it is a bottom-up model.

3. Star schema uses more space. While it uses less space.

It takes less time for the execution of While it takes more time than star schema for the execution
4.
queries. of queries.

In star schema, Normalization is not While in this, Both normalization and denormalization are
5.
used. used.

6. It’s design is very simple. While it’s design is complex.

The query complexity of star schema is While the query complexity of snowflake schema is higher
7.
low. than star schema.

8. It’s understanding is very simple. While it’s understanding is difficult.

UNIT-IV 4. 18
Paavai Institutions Department of CSE

S.N
O Star Schema Snowflake Schema

9. It has less number of foreign keys. While it has more number of foreign keys.

10. It has high data redundancy. While it has low data redundancy.

4.7 FACT CONSTELLATION SCHEMA

A Fact constellation means two or more fact tables sharing one or more dimensions. It is also
called Galaxy schema. Fact Constellation Schema describes a logical structure of data warehouse or data mart.
Fact Constellation Schema can design with a collection of de-normalized FACT, Shared, and Conformed
Dimension tables.

Fact Constellation Schema is a sophisticated database design that is difficult to summarize


information.

UNIT-IV 4. 19
Paavai Institutions Department of CSE

Fact Constellation Schema can implement between aggregate Fact tables or decompose a complex
Fact table into independent simplex Fact tables.

A fact constellation schema has multiple fact tables. It is also known as galaxy schema. It is a widely
used schema and more complex than star schemas and snowflake schemas. It is possible to create a fact
constellation schema by splitting the original star schema into more star schemas. It has many fact tables and
some common dimension tables.

Example: A fact constellation schema is shown in the figure below.

UNIT-IV 4. 20
Paavai Institutions Department of CSE

This schema defines two fact tables, sales, and shipping. Sales are treated along four dimensions,
namely, time, item, branch, and location. The schema contains a fact table for sales that includes keys to each
of the four dimensions, along with two measures: Rupee_sold and units_sold.

The shipping table has five dimensions, or keys: item_key, time_key, shipper_key, from_location, and
to_location, and two measures: Rupee_cost and units_shipped.

The primary disadvantage of the fact constellation schema is that it is a more challenging design
because many variants for specific kinds of aggregation must be considered and selected.

4.8 SCHEMA DEFINITION

 Schema in general terms means “Logical Structure”. So, data warehouse schema describes the logical
structure of any data warehouse containing records.
 Also, the concept behind schema of data warehouse is same as that in data bases. Relational data
models are used by data bases for their logical structure while data warehouses uses schema for the
same purpose.
 The schema in data warehouses are used to get the knowledge of complexity of a structure of data
warehouse.
 They are basically the representation of the outer model or the way to logically deduce the results from
the figure and these figures are made from combinations of fact tables and dimension tables.
 The conceptual modelling of warehouse comprises of three models. These are:-

Data Warehouse Schema : Types

UNIT-IV 4. 21
Paavai Institutions Department of CSE

 A fact table contains keys to dimension tables which can be referred by using the concept of foreign
key.
 A dimension table is one that consists of keys to facts present in fact table and their corresponding
attributes.

4.8.1. Data Warehouse Schema : The Star Schema

 Simplest structural description of any data warehouse and is the least complex among all the
dimension models i.e. Snowflake and Galaxy schema.
 The star schema contains a single fact table that is connected to multiple dimension tables in the form
of shape of star.
 The basic structure enables star schema to perform functionalities and can handle only simple data
mining queries.
 Each dimension table contains information of attributes present in fact table.

UNIT-IV 4. 22
Paavai Institutions Department of CSE

Data Warehouse Schema : Star Schema

Advantages : Star Schema

 Highly optimized performance.


 Applicable on both large scale data sources(such as databases) as well as small scale data sources(such
as data marts).
 Star schema is simple and easy to maintain.

Disadvantages : Star Schema

 Lesser accuracy and consistency.


 De-normalized data.
 Data Redundancy.

UNIT-IV 4. 23
Paavai Institutions Department of CSE

4.8.2. Data Warehouse Schema : The Snowflake Schema

 The snowflake schema describes the logical structure in much more detail as compared to star schema.
Snowflake schema is more complex than Star schema but less complex than Galaxy(Fact
constellation) Schema.
 The major difference between snowflake and star schema is, star schema contains data in the form of
fact tables which are not normalized but in case of snowflake schema, data is normalized.
 This schema is called as snowflake because it portrays the shape similar that of a snowflake with fact
table connected to multiple dimension tables that are drawn out from other dimension tables. This
results in more use of joins resulting in performance throttling.

Data Warehouse Schema : Snowflake Schema

Advantages : Snowflake Schema

 Lesser data redundancy.


 Normalized data usage.
 More accurate and consistent than star schema.

UNIT-IV 4. 24
Paavai Institutions Department of CSE

Disadvantages : Snowflake Schema

 Slower as compared to star schema due to use of joins.


 More complex than star schema.
 More complex queries are required because of use of joins.

4.3.3. Data Warehouse Schema : The Galaxy Schema

 Galaxy schema is also known as fact constellation. Fact constellation refers to combination of fact
tables and dimension tables using joins.
 Multiple star schema are connected together to form galaxy schema.

Data Warehouse Schema : Galaxy Schema

Advantages : Galaxy Schema

 Highly flexible.
 No data redundancy.
 Low memory/space required.

UNIT-IV 4. 25
Paavai Institutions Department of CSE

Disadvantages : Galaxy Schema

 Complicated design.
 To create, implement and maintain galaxy schema is a tough job.
 More complex queries are required because of higher number of joins used to connect fact and
dimension tables.
 Data analysis is difficult because of complex structure.

4.9. DATA WAREHOUSE PROCESS ARCHITECTURE

The process architecture defines an architecture in which the data from the data warehouse is
processed for a particular computation.

Following are the two fundamental process architectures:

Centralized Process Architecture

In this architecture, the data is collected into single centralized storage and processed upon completion
by a single machine with a huge structure in terms of memory, processor, and storage.

Centralized Process Architecture

Centralized process architecture evolved with transaction processing and is well suited for small
organizations with one location of service It requires minimal resources both from people and system
perspectives. It is very successful when the collection and consumption of data occur at the same location.

UNIT-IV 4. 26
Paavai Institutions Department of CSE

Distributed Process Architecture

In this architecture, information and its processing are allocated across data centers, and its processing
is distributed across data centers, and processing of data is localized with the group of the results into
centralized storage. Distributed architectures are used to overcome the limitations of the centralized process
architectures where all the information needs to be collected to one central location, and results are available
in one central location.

There are several architectures of the distributed process:

Client-Server

In this architecture, the user does all the information collecting and presentation, while the server does the
processing and management of data.

Three-tier Architecture

UNIT-IV 4. 27
Paavai Institutions Department of CSE

With client-server architecture, the client machines need to be connected to a server machine, thus mandating
finite states and introducing latencies and overhead in terms of record to be carried between clients and
servers.

N-tier Architecture

The n-tier or multi-tier architecture is where clients, middleware, applications, and servers are isolated into
tiers.

Cluster Architecture

In this architecture, machines that are connected in network architecture (software or hardware) to
approximately work together to process information or compute requirements in parallel. Each device in a
cluster is associated with a function that is processed locally, and the result sets are collected to a master
server that returns it to the user.

Peer-to-Peer Architecture

This is a type of architecture where there are no dedicated servers and clients. Instead, all the processing
responsibilities are allocated among all machines, called peers. Each machine can perform the function of a
client or server or just process data.

4.10 TYPES OF DATABASE PARALLELISM

Parallelism is used to support speedup, where queries are executed faster because more resources, such
as processors and disks, are provided. Parallelism is also used to provide scale-up, where increasing
workloads are managed without increase response-time, via an increase in the degree of parallelism.

Different architectures for parallel database systems are shared-memory, shared-disk, shared-nothing, and
hierarchical structures.

(a)Horizontal Parallelism: It means that the database is partitioned across multiple disks, and parallel
processing occurs within a specific task (i.e., table scan) that is performed concurrently on different processors
against different sets of data.

UNIT-IV 4. 28
Paavai Institutions Department of CSE

(b)Vertical Parallelism: It occurs among various tasks. All component query operations (i.e., scan, join, and
sort) are executed in parallel in a pipelined fashion. In other words, an output from one function (e.g., join) as
soon as records become available.

Intraquery Parallelism

Intraquery parallelism defines the execution of a single query in parallel on multiple processors and
disks. Using intraquery parallelism is essential for speeding up long-running queries. Interquery parallelism
does not help in this function since each query is run sequentially.

Interquery Parallelism

In interquery parallelism, different queries or transaction execute in parallel with one another.This
form of parallelism can increase transactions throughput. The response times of individual transactions are not
faster than they would be if the transactions were run in isolation.

Thus, the primary use of interquery parallelism is to scale up a transaction processing system to
support a more significant number of transactions per second.

Database vendors started to take advantage of parallel hardware architectures by implementing


multiserver and multithreaded systems designed to handle a large number of client requests efficiently.This
approach naturally resulted in interquery parallelism, in which different server threads (or processes) handle
multiple requests at the same time. Interquery parallelism has been successfully implemented on SMP
systems, where it increased the throughput and allowed the support of more concurrent users.
UNIT-IV 4. 29
Paavai Institutions Department of CSE

Shared Disk Architecture

Shared-disk architecture implements a conce pt of shared ownership of the entire database between
RDBMS servers, each of which is running on a node of a distributed memory system. Each RDBMS server
can read, write, update, and delete information from the same shared database, which would need the system
to implement a form of a distributed lock manager (DLM).DLM components can be found in hardware, the
operating system, and separate software layer, all depending on the system vendor. On the positive side,
shared-disk architectures can reduce performance bottlenecks resulting from data skew (uneven distribution of
data), and can significantly increase system availability.

The shared-disk distributed memory design eliminates the memory access bottleneck typically of large SMP
systems and helps reduce DBMS dependency on data partitioning.

Shared-Memory Architecture

Shared-memory or shared-everything style is the traditional approach of implementing an RDBMS on


SMP hardware.It is relatively simple to implement and has been very successful up to the point where it runs
into the scalability limitations of the shared-everything architecture.The key point of this technique is that a
single RDBMS server can probably apply all processors, access all memory, and access the entire database,
thus providing the client with a consistent single system image.

UNIT-IV 4. 30
Paavai Institutions Department of CSE

Shared-Nothing Architecture

In a shared-nothing distributed memory environment, the data is partitioned across all disks, and the
DBMS is "partitioned" across multiple co-servers, each of which resides on individual nodes of the parallel
system and has an ownership of its disk and thus its database partition.A shared-nothing RDBMS parallelizes
the execution of a SQL query across multiple processing nodes.

Each processor has its memory and disk and communicates with other processors by exchanging
messages and data over the interconnection network.

This architecture is optimized specifically for the MPP and cluster systems.The shared-nothing
architectures offer near-linear scalability. The number of processor nodes is limited only by the hardware
platform limitations (and budgetary constraints), and each node itself can be a powerful SMP system.

UNIT-IV 4. 31
Paavai Institutions Department of CSE

4.11 DATA WAREHOUSE TOOLS

The tools that allow sourcing of data contents and formats accurately and external data stores into the data
warehouse have to perform several essential tasks that contain:

o Data consolidation and integration.


o Data transformation from one form to another form.
o Data transformation and calculation based on the function of business rules that force transformation.
o Metadata synchronization and management, which includes storing or updating metadata about source
files, transformation actions, loading formats, and events.

There are several selection criteria which should be considered while implementing a data warehouse:

1. The ability to identify the data in the data source environment that can be read by the tool is necessary.
2. Support for flat files, indexed files, and legacy DBMSs is critical.
UNIT-IV 4. 32
Paavai Institutions Department of CSE

3. The capability to merge records from multiple data stores is required in many installations.
4. The specification interface to indicate the information to be extracted and conversation are essential.
5. The ability to read information from repository products or data dictionaries is desired.
6. The code develops by the tool should be completely maintainable.
7. Selective data extraction of both data items and records enables users to extract only the required data.
8. A field-level data examination for the transformation of data into information is needed.

Data Warehouse Software Components

A warehousing team will require different types of tools during a warehouse project. These software
products usually fall into one or more of the categories illustrated, as shown in the figure.

Extraction and Transformation

The warehouse team needs tools that can extract, transform, integrate, clean, and load information from a
source system into one or more data warehouse databases. Middleware and gateway products may be needed
for warehouses that extract a record from a host-based source system.

Warehouse Storage
UNIT-IV 4. 33
Paavai Institutions Department of CSE

Software products are also needed to store warehouse data and their accompanying metadata. Relational
database management systems are well suited to large and growing warehouses.

Data access and retrieval

Different types of software are needed to access, retrieve, distribute, and present warehouse data to its end-
clients.

QUESTION BANK

PART – A
1. Define multi dimensional data model.
2. What is a data cube?

UNIT-IV 4. 34
Paavai Institutions Department of CSE

3. Define dimensions.

4. Define Fact Table.

5. List out the Schemas for Multidimensional Data M.


6. Define star schema.

7. Define snowflake schema.

8. Define fact constellation schema.


9. Define conceptual hierarchies.
10. Define data cube measure.
PART – B
1. Design a star-schema, snow-flake schema and Fact-constellation schema for the following
data warehouse that consist of the following four dimensions: (Time, Item, Branch and
Location) .Include the appropriate measures required for the schemas. Create.
2. Discuss about multidimensional database, data mart and data cube? Explain schemas for multi-
dimensional database.
3. Explain the star schema, snowflake schema and fact constellation schema with examples.

4. Construct a star schema for hospital management system.

UNIT-IV 4. 35
Paavai Institutions Department of CSE

UNIT-IV 4. 36

You might also like