0% found this document useful (0 votes)
64 views

Denormalization

This document discusses denormalization as a technique to improve database performance. It begins by providing background on database normalization and how normalization is important for data integrity but can degrade query performance due to large numbers of joins. The document then argues that denormalization, when done systematically and carefully, can optimize performance while still maintaining data integrity. It proposes a framework for systematically applying denormalization and discusses some specific denormalization techniques. Finally, it presents results showing performance gains from denormalizing a hierarchical database table.

Uploaded by

Tanay Behera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Denormalization

This document discusses denormalization as a technique to improve database performance. It begins by providing background on database normalization and how normalization is important for data integrity but can degrade query performance due to large numbers of joins. The document then argues that denormalization, when done systematically and carefully, can optimize performance while still maintaining data integrity. It proposes a framework for systematically applying denormalization and discusses some specific denormalization techniques. Finally, it presents results showing performance gains from denormalizing a hierarchical database table.

Uploaded by

Tanay Behera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Global Journal of Computer Science and Technology P a g e | 44

A Framework for Systematic Database


Denormalization
YMA PINTO

Goa University , India


[email protected]

Abstract- It is currently the norm that relational database delete anomalies that would have otherwise been present in
designs should be based on a normalized logical data model. a non-normalized database. Another goal of normalization is
The primary objective of this design technique is data integrity to minimize redesign of the database structure. Admittedly,
and database extendibility. The Third Normal Form is it is impossible to predict every need that your database
regarded by academicians and practitioners alike to be point at design will have to fulfill and every issue that is likely to
which the database design is most efficient. Unfortunately, even arise, but it is important to mitigate against potential
this lower level normalization form has a major drawback with problems as much as possible by a careful planning.
regards to query evaluation. Information retrievals from the Arguably, normalizing your data is essential to good
database can result in large number of joins which degrades performance, and ease of development, but the question
query performance. So you need to sometimes break always comes up: "How normalized is normalized enough?"
theoretical rules for real world performance gains. Most Many books on normalization, mention that 3NF is
existing Conceptual Level RDBMS data models provide a set of essential, and many times BCNF, but 4NF and 5NF are
constructs that only describes ―what data is used‖ and does really useful and well worth the time required to implement
not capture ―how the data is being used‖. The question of ―how
them [Davidson, 2007]. This optimization, however, results
in performance degradation in data retrievals from the
data is used‖ gets embedded in the implementation level
database as a large number of joins need to be done to solve
details. As a result, every application built on the existing
queries [Date, 1997] [Inmon, 1987] [Schkolnick and
database extracts the same or similar data in different ways. If
Sorenson ,1980].
the functional use of the data is also captured, common query
"Third normal form seems to be regarded by many as the
evaluation techniques can be formulated and optimized at the
points where your database will be most efficient ... If your
design phase, without affecting the normalized database database is overnormalized you run the risk of excessive
structure constructed at the Conceptual Design phase. This table joins. So you denormalize and break theoretical rules
paper looks at denormalization as an effort to improve the for real world performance gains." [Sql Forums, 2009].
performance in data retrievals made from the database There is thus a wide gap between the academicians and the
without compromising data integrity. A study on a hierarchical database application practitioners which needs to be
database table shows the performance gain - with respect to addressed. Normalization promotes an optimal design from
response time – using a denormalization technique. a logical perspective. Denormalization is a design level that
needs to be mitigated one step up from normalization. With
Keywords: denormalization, database deign, performance tuning, respective to performance of retrieval, denormalization is
materialized views, query evaluation not necessarily a bad decision if implemented following a
systematic approach to large scale databases where dozens
I. INTRODUCTION of relational tables are used.
Denormalization is an effort that seeks to optimize
M ost of the applications existing today have been built,
or are still being built using RDBMS or ORDBMS
technologies. The RDBMS is thus not dead, as stated by
performance while maintaining data integrity. A
denormalized database is thus not equivalent to a database
Arnon-Roten [Roten_Gal, 2009]. Van Couver, a software that has not been normalized. Instead, you only seek to
engineer with vast experience in databases at Sun denormalize a data model that has already been normalized.
MicroSystems, emphasizes the fact that RDBMSs are here This distinction is important to understand, because you go
to stay but do require improvements in scalability and from normalized to denormalized, not from nothing to
performance bottlenecks [Couver , 2009]. denormalized. The mistake that some software developers
Normalization is the process of putting one fact and nothing do is to directly build a denormalized database considering
more than one fact in exactly one appropriate place. Related only the performance aspect. This only optimizes one part of
facts about a single entity are stored together, and every the equation, which is database reads. Denormalization is a
attribute of each entity is non-transitively associated to the design level that is one step up from normalization and
Primary Key of that entity. This design technique results in should not be treated naively. Framing denormalization
enhanced data integrity and removes insert, update and against normalization purely in the context of performance
P a g e | 45 Global Journal of Computer Science and Technology

is unserious and can result in major application problems is the overheads required in view consistency maintenance.
[Thought Clusters, 2009]. We need to understand how and Denormalization is not necessarily a bad decision if
when to use denormalization implemented wisely [Mullins , 2009].
This paper is organized as follows: Section 1 introduces the
concept and current need for denormalization. Section 2 Some denormalization techniques have been researched and
provides us a background of the related work in this area implemented in many strategic applications to improve
from the academic and the practitioners‘ point of view. query response times. These strategies are followed in the
Section 3 makes a strong case for denormalization while creation of data warehouses and data marts [Shin and
Section 4 presents the framework for a systematic Sanders, 2006] [Barquin and Edelstein ] and are not directly
denormalization. Section 5 elucidates some denormalization applicable to an OLTP system. Restructuring a monolithic
techniques that can be followed during the database design Web application composed of Web pages that address
life cycle and shows the performance gain of this technique queries to a single database into a group of independent
over a Hierarchical Normalized Relation. Web services querying each other also requires
denormalization for improved performance [Wei Z et al,
2008].
II. BACKGROUND AND RELATED WORK
Relational Databases can be roughly categorized into Several researches have developed a list of normalization
Transaction Processing (OLTP) and Data Warehouse and denormalization types ,and have subsequently
(OLAP). As a general rule, OLTP databases use normalized mentioned that denormalization should be carefully
schema and ACID transactions to maintain database deployed according to how the data will be used [Hauns
integrity as the data needs to be continuously updated when ,1994] [Rodgers, 1989].The primary methods that have been
transactions occur. As a general rule, OLAP databases use identified are : combining tables, introducing redundant
unnormalized schema (the ―star schema‖ is the paradigmatic data, storing derivable data, allowing repeating groups,
OLAP schema) and are accessed without transactions partitioning tables, creating report tables, mirroring tables.
because each table row is written exactly one time and then These ―denormalization patterns‖ have been classified as
never deleted nor updated. Often, new data is added to Collapsing Relations, Partitioning Relations, Adding
OLAP databases in an overnight batch, with only queries Redundant Attributes and Adding Derived Attributes [
occurring during normal business hours [Lurie M.,IBM, Sanders and Shin ,2001]
2009] [Microsoft SQL Server guide] [Wiseth ,Oracle].
Software developers and practitioners mention that database III. A CASE FOR DENORMALIZATION
design principles besides normalization, include building of Four main arguments that have guided experienced
indices on the data and denormalization of some tables for practitioners in database design have been listed here [26]
performance. Performance tuning methods like indices and The Convenience Argument
clustering data of multiple tables exist, but these methods The presence of calculated values in tables‘ aids the
tend to optimize a subset of queries at the expense of the evaluation of adhoc queries and report generation.
others. Indices consume extra storage and are effective only Programmers do not need to know anything about the API
when they work on a single attribute or an entire key value to do the calculation.
.The evaluation plans sometimes skip the secondary indexes The Stability Argument
that are created by users if these indices are nonclustering As systems evolve, new functionality must be provided to
[Khaldtiance , 2008]. the users while retaining the original. History data may still
Materialized Views can also be used as a technique for need to be retained in the database.
improving performance [Vincent et al,97] but these The Simple Queries Argument
consume vast amount of storage and their maintenance Queries that involve join jungles are difficult to debug and
results in additional runtime overheads. Blind application of dangerous to change. Eliminating joins makes queries
Materialized Views can actually result in worse query simpler to write, debug and change
evaluation plans and should be used carefully [Chaudhuri et The Performance Argument
al, 1995]. View update techniques have been researched and Denormalized databases require fewer joins in comparison
a relatively new method of updating using additional views to normalized relations. Computing joins are expensive and
has been proposed [Ross et al, 1996]. time consuming. Fewer joins directly translates to improved
In the real world, denormalization is sometimes necessary. performance.
There have been two major trends in the approach to Denormalization of Databases, ie, a systematic creation of a
demoralization. The first approach uses a ―non normalized database structure whose goal is performance improvement,
ERD‖ where the entities in the ERD are collapsed to is thus needed for today‘s business processing requirements.
decrease the joins. In the second approach, denormalization This should be an intermediate step in the DataBase Design
is done at the physical level by consolidating relations, Life Cycle integrated between the Logical DataBase Design
adding synthetic attributes and creating materialized views Phase and the Physical DataBase Design Phase. Retrieval
to improve performance. The disadvantage of this approach performance needs dictate very quick retrieval capability for
Global Journal of Computer Science and Technology P a g e | 46

data stored in relational databases, especially since more violate data integrity. The IUDs to data are done on the Base
accesses to databases are being done through Internet. Users Tables and the denormalized structures are kept in synch by
are concerned with more prompt responses than an optimum triggers on the base tables.
design of databases. To create a Denormalization Schema Since the denormalized structures are used for information
the functional usage of the operational data must be retrieval , they need to consider the authorization access that
analyzed for optimal Information Retrieval. users have over the base tables.
The construction of the ―Denormalization View‖ is not an
Some of the benefits of denormalization can be listed: intermediate step between the Logical and the Physical
Design phases, but needs to be consolidated by considering
(a)Performance improvement by all 3 views of the SPARC ANSI architectural specifications.
 Precomputing derived data
 Minimizing joins Most existing Conceptual Level RDBMS data models
 Reducing Foreign Keys provide a set of constructs that describes the structure of the
 Reducing indices and saving storage database [Elmashree and Navathe]. This higher level of
 Smaller search sets of data for partitioned tables conceptual modeling only informs the end user ―what data is
 Caching the Denormalized structures at the Client used‖ and does not capture ―how the data is being used‖.
for ease of access thereby reducing query/data The question of ―how data is used‖ gets embedded in the
shipping cost. implementation level details. As a result, every application
built on the existing database extracts the same or similar
data in different ways. If the functional use of the data is
(b)Since the Denormalized structures are primarily also captured, common query evaluation techniques can be
designed keeping in mind the functional usage of the formulated and optimized at the design phase, without
application, users can directly access these structures rather affecting the normalized database structure constructed at
then the base tables for report generation. This also reduces the Conceptual Design phase. Business rules are descriptive
bottlenecks at the server. integrity constraints or functional (derivative or active) and
ensure a well functioning of the system. Common models
used during the modeling process of information systems do
A framework for denormalization needs to address the not allow the high level specification of business rules
following issues: except a subset of ICs taken into account by the data model
(i) Identify the stage in the DataBase Design Life Cycle [Amghar and Mezaine, 1997].
where Denormalization structures need to be created.
(ii) Identify situations and the corresponding candidate
base tables that cause performance degradation. The ANSI 3 level architecture stipulates 3 levels – The
(iii) Provide strategies for boosting query response times. External Level and the Conceptual Level, which captures
(iv) Provide a method for performing the cost-benefit data at rest, and the Physical Level which describes how the
analysis. data is stored and depends on the DBMS used. External
(v) Identify and strategize security and authorization Schemas or subschemas relate to the user views. The
constraints on the denormalized structures. Conceptual Schema describes all the types of data that
Although (iv) and (v) above are important issues in appear in the database and the relationships between data
denormalization, they will not be considered in this paper items. Integrity constraints are also specified in the
and will be researched on later. conceptual schema. The Internal Schema provides
definitions for stored records, methods of representation,
data fields, indexes, and hashing schemes. Although this
architecture provides the application development
IV. A DENORMALIZATION FRAMEWORK environment with logical and physical data independence, it
The framework presented in this paper differs from the does not provide an optimal query evaluation platform. The
papers surveyed above in the following respects: DBA has to balance conflicting user requirements before
It does not create denormalized tables with all contributing creating indices and consolidating the Physical schema.
attributes from the relevant entities, but instead creates a set
of Denormalized Structures over a set of Normalized tables. The reason denormalization is at all possible in relational
This is an important and pertinent criteria as these structures databases is because, courtesy of the relational model, which
can be built over existing applications with no ―side effects creates lossless decompositions of the original relation, no
of denormalization‖ over the existing data. Information is lost in the process. The Denormalized
The entire sets of attributes from the contributing entities are structure can be reengineered and populated from the
not stored in the Denormalized structure. This greatly existing Normalized database and vice-versa. In a
reduces the storage requirements and redundancies. distributed application development environment the
The Insert, Update and Delete operations (IUDs) are not Denormalization Views can be cached on the client resulting
done to the denormalized structures directly and thus do not in a major performance boost by saving run time shipping
P a g e | 47 Global Journal of Computer Science and Technology

costs. It would require only the Denormalization View  the number of entities the queries involve,
Manager to be installed on the Client.  the usage of the data (ie, the kind of attributes and
A High Level Architecture that this framework considers is their frequency of extraction within queries and
defined as follows: reports),
 the volume of data being analyzed and extracted in
queries ( cardinality and degree of relations,
number and frequency of tuples, blocking factor of
tuples, clustering of data, estimated size of a
relation ),
 the frequency of occurrence and the priority of the
query,
 the time taken by the queries to execute(with and
without denormalization).

The problem can now be stated as ―Given a logical schema


with its corresponding database statistics and a set of queries
with their frequencies, arrive at a set of denormalized
structures that enhances query performance‖

A few definitions are required


Defn 1: A Relational Data Information Retrieval System
(RDIRS) has as its core components (i) a set of Normalized
Relations {R} (ii) a set of Integrity Constraints {ICs} (iii) a
set of data access methods {A} (iv) a set of Denormalization
Structures {DS} and (v) a set of queries and subqueries that
can be defined and evaluated on these relations.
Each component of the RDIRS, by definition, can have
dynamic elements resulting in a flexible and evolvable
To realize the potential of the Denormalization View, system.
efficient solutions to the three encompassing issues are
required: Defn 2: A ―Denormalized Structure‖ (DSM) is a relvar
[Date ,Kannan , Swamynathan] comprising of the
Denormalization View design: Determining what data and Denormalized Schema Design and the Denormalized
how it is stored and accessed in the Denormalization Structure Manager.
Schema
Denormalization View maintenance: Methods to
efficiently update the data in the Denormalized schema
when base tables are updated.
Denormalization View exploitation: Making efficient use
of denormalization views to speed up query processing
(either entire queries or sub queries)
Extensive research has been done on subquery evaluation on
materialized views [Afrati et al, 2001] [Chirkova et al, 2006]
[Halevy , 2001]

The inputs that are required for the construction of the A system cannot enforce truth, only consistency. Internal
Denormalized schema can be identified as: Predicates (IPs) are what the data means to the system and
 the logical and external views schema design, External Predicates (EPs) are what data means to a user. The
 the physical storage and access methods provided EPs result in criterion for acceptability of IUD operations on
by the DBMS, the data, which is an unachievable goal [Date, Kannan,
 the authorization the users have on the Swamynathan], especially when Materialized Views are
manipulation and access of the data within the created. In the framework presented in this paper, IUDs on
database, the Denormalized Structures are never rejected as these are
 the interaction (inter and intra) between the entities, automatically propagated to the base relations where the
Global Journal of Computer Science and Technology P a g e | 48

Domain and Table level ICs are enforced. Once the base The Denormalization Schema Design is an input to the
relations are updated, the Denormalized Schema Relation Query Optimizer for collapsing access paths, resulting in the
triggers are invoked atomically to synchronize the data, IRT which is then submitted to the Query Evaluation
ensuring simultaneous consistency of Base and Engine.
Denormalized tables. Further, the primary reason for the
Denormalization Structures is faster information retrieval Although the metadata tables are query able at the server,
and not data manipulation; hence no updates need be made the Denormalized Structure Manager can have its own
to the Denormalization Schema directly. metadata stored locally (at the node where the DSs are
stored).
Every Normalized Relation requires a Primary Key which DS_Metadata_Scheme(DS_Name,DS_Trigger_Name,DS
satisfies the Key Integrity Constraint. This PK maintains _Procedure_Name, DS_BT1_Name,
uniqueness of tuples in the database and is not necessarily Creator,DS_BT1_Trigger_Name,DS_BT2_Trigger_Nam
the search key value for users. For the RDIRS we define e,DS_BT1_Authorization,DS_BT2_Authorization)

Defn 3: An Information Retrieval Key (IRK) is a (set of) V. DENORMALIZATION TECHNIQUES


attributes that the users most frequently extract from an
entity. The IRK is selected from amongst the mandatory
Denormalization looks at normalized databases which have
attribute values which gives the end user meaningful
operational data, but whose performance degrades during
information about the entity.
query evaluation. There are several indicators which will
For ex, an employee table may have an Empid as its PK, but
help to identify systems and tables which are potential
the IRK could be EmpName and Contact No.
denormalization candidates.
Defn 4: An Information Retrieval Tree (IRT) is a Query
The techniques that can be used are summarized below:
Evaluation Tree which has as its components the operators
required to extract the information from the database and the
a. Pre joined Tables
relvars that contribute to an optimized Data Extraction Plan.
Application: When two or more tables need to be joined on
The IRT consists of relational algebra operations along the
a regular basis and the cost of joins is prohibitive.
intermediate nodes and the relvars in the leaf nodes (base
This happens when Foreign Keys become a part of a relation
relations, views, materialized views or denormalization
or when transitive dependencies are removed.
structures) and is a requisite for cost benefit analysis and
Denormalization Technique: Collapse the relations.
query rewrites.
b. Report Tables
Researchers and Practitioners [Inmon, 1987] [Shin and
Sanders, 2006] [Mullins, 2009] create the denormalized Application: When the application requires creation of
tables by creating a schema with all the attributes from the specialized reports that requires lot of formatting and data
participating entities. This results in (i) additional storage manipulation.
and redundancy (ii) slows down the system on updates to Denormalization Technique: The report table must contain
data (iii) creates a scenario for data anomalies. the mandatory columns required for the report

Defn 5: The Denormalization Schema (DS) in the RDIR c. Fragmenting Tables


Model is a relation that has as its attributes only the PKs, the
IRKs and the URowIds (Universal Row Id) of the Application: If separate pieces of a normalized table are
participating or contributing Base Relations. accessed by different and distinct groups of users or
applications, then the original relation can be split into two
The storage of only the PK, IRKs and URowIds is justifiable (or more) denormalized tables; one for each distinct
as most often, end users are interested in only the significant processing group. The relation can be fragmented
attributes of an entity. If required, the remaining attributes horizontally or vertically by preserving losslessness.
can be obtained from the base table using the RowId field Denormalization Technique: When horizontal
stored in the Denormalized Scheme. The URowIds are fragmentation is done, the predicate must be chosen such
chosen as they can even support row-ids on remote foreign that rows are not duplicated.
tables. When vertical fragmentation is done, the primary key must
It is interesting to note that even when a ―select * ―clause is be included in the fragmented tables. Associations between
used in an adhoc query, it is either because the user is the attributes of the relation must be considered. Projections
unaware of the attributes of the entity or is uninterested in that eliminate rows in the fragmented tables must be
the attribute per se, but is actually looking for other avoided.
information.
P a g e | 49 Global Journal of Computer Science and Technology

table removing the restriction on the number of values that


5.1: An illustration of the above techniques can repeat.

Consider the following Normalized database (3NF) relations: f. Derivable Data


(Primary Keys are in Red , Foreign keys are in Blue)
Application: If the cost of deriving data using complicated
formulae is prohibitive then the derived data can be stored in
a column. It is imperative that the stored derived value needs
Customer (CustomerNo, CustomerName, ContactId) to be changed when the underlying values that comprise the
calculated value change.
Denormalization Technique: Frequently used aggregates
can be precomputed and materialized in an appropriate
Order OrderNo, CustomerNo, OrderDate, ShipRecdDate, relation.
VATax, Local_Tax, ShiptoContactId, BillToContactId)
g. Hierarchical Speed Tables
Application: A hierarchy or a recursive relation can be
easily supported in a normalized relational table but is
ContactInfo (ContactId, Name, Street, City, State Country, difficult to retrieve information from efficiently.
Zip)
Denormalized ―Speed Tables‖ are often used for faster data
retrieval.
Denormalization Technique: Not only the immediate
parent of a node is stored, but all of the child nodes at every
ContactPhone (ContactId PhoneNo) level are stored.

Some of the major reports identified and that need to be


generated from this database:
Item (ItemNo, ItemName, ItemPrice, ItemPart, SubItemNo)  What are the current outstanding orders along with
their shipping and Billing details
 For a given order, find all the parts that are ordered
along with the subparts of that part.
OrderItem (OrderNo, ItemSerialNo, ItemNo, Quantity)  Prepare a voucher for a given order.
 For orders that were paid for on the same date that
d. Redundant Data the Shipment was received, give a 10% discount if
the amount exceeds a value ‗x‘ and a 20% discount
PaymentInfo
Application: (OrderNo,
Sometimes PaymentNo,
one or more PaymentType,
columns from one if the amount exceeds a value ‗y‘.
table are accessed whenever data from another table is
PaymentDate)  Retrieve all sub items that item number 100
accessed. If this happens frequently they could be stored as contains
redundant data in the tables.  Find all subparts that have no subpart.
Denormalization Technique: The columns that are The Denormalized Schema thus constructed over the
duplicated PaymentType (PaymentType,
in the relation Description)
to avoid a lookup (join) should be Normalized Tables to improve performance and using the
used by a large number of users but should not be techniques described above:
frequently updated.
DN_Oust_Order (OrderNo, CustomerNo, OrderDate,
e. Repeating Groups ShipToContactInfo_Name, ShipToContactPhone_PhNo,
BillToContactInfo_Name, BillToContactPhone_PhNo,
Application: When repeating groups are normalized they ShipToContactInfo_URowId, BillToContactInfo_URowId)
are implemented as distinct rows instead of distinct columns
resulting in less efficient retrieval. These repeating groups
can be stored as a nested table within the original parent DN_Aggregate (OrderNo, OrderDate, TotalAmt, Discount)
table.
Before deciding to implement repeating groups, it is DN_Voucher (OrderNo, OrderDate, ItemName, ItemPrice,
important to consider if the data will be aggregated or Quantity, DN_Aggregate_RowId)
compared within the row or if the data would be accessed
collectively, otherwise SQL may slow down query
evaluation. DN_Item_Hierarchy (Main_ItemId, Sub_ItemId,
Denormalization Technique: Repeating groups can be Child_Level, Is_Leaf, Item_URowId)
stored as ―setoff(values)‖ - SQL Extensions - within the These tables can be created using the
Global Journal of Computer Science and Technology P a g e | 50

create materialized view


build immediate 2.
refresh fast on commit
enable query rewrite select itemno,itemname,parentitem from item start with
clauses provided by the DBMSs. The URowIds of the Base parentitem=100 connect by prior itemno=parentitem ;
Table rows can also be selected and inserted into the
Denormalized Schema Extensions.
The DN_Aggregate Tables need to be created using the
withschemabinding 69 rows selected.
clause .
The Denormalized Hierarchy tables can be created using the Elapsed: 00:00:00.17
connect by prior
start with
level
clauses. 3.
The CONNECT BY prior clause can automatically handle
insertions. select parentitem,childitemno,itemname from dn_item_hier
where parentitem=100

5.2: A Performance Study on Hierarchical Queries


The Hierachical Technique for Denormalization needs to be Consider a query ―Find all items that are contained in
69 rows selected.
further illustrated. item 100‖ that requires to be run on the above table. This
involves finding the child nodes at every level of the
Elapsed: 00:00:00.15
Considering the Normalized Item Data consisting of data hierarchy.
shown below (partial view of the database)
A Solution to the above query:
Select ItemNo from item where
ParentItemNo=‘100‘
100

101 105
Union
With an increased set of tuples, and a greater depth in the
Select ItemNo from item where ParentItemNo
hierarchy, the improvement will be substantial.
in
108 200 203
204 (Select ItemNo from item
where ParentItemNo=‘100‘)
Union
109
209 Select ItemNo from item where ParentItemNo
110 112
in
111 (Select ItemNo from item where
ParentItemNo in
Figure 3: Partial Hierarchical Item Data
(Select ItemNo from item
where ParentItemNo=‘100‘))

The Normalized Relation for the Hierarchical Item Table This retrieval query, besides being extremely inefficient, one
would be stored as needs to know the maximum depth of the hierarchy.
ItemNo ParentItemNo OtherItemDetails
100 … The Denormalized Schema for the Item Information in the
101 100 … RDIRS :
105 100 … DN_Item_Hierarchy (ParentItemNo, ChildItemNo,
108 101 … ItemName, ChildLevel, IsLeaf, Item_URowId)
200 101 … The ChildLevel ascertains the level in the hierarchy that the
203 101 … child node is at; IsLeaf specifies if that node has further
204 101 … child nodes and makes queries like ―Find all items that
109 108 … have no subparts‖ solvable efficiently.
110 108 …
111 108 … The (part) extension of the DN_Item_Hierarchy Schema
112 108 … ParentItemNo ChildItemNo ItemName ChildLevel IsLeaf
209 204 … ItemRowId
P a g e | 51 Global Journal of Computer Science and Technology

100 101 SubPart1 The results are as shown :


1 N ….
100 105 SubPart2 1
N ….
100 108 SubPart3 2 1.
N ….
100 200 SubPart4 Set timing on;
2 Y ….
100 203 SubPart5 select itemno,itemname,parentitem from item1 where
2 Y …. itemno in
100 204 SubPart6
2 N .... (select itemno from item1 where parentitem=100
100 109 SubPart7
3 Y …. union
100 110 SubPart8
3 Y …. select itemno from item1 where parentitem in
100 111 SubPart9
3 Y …. (select itemno from item1 where parentitem=100)
100 112 SubPart10
3 Y …. union
100 209 SubPart11
3 Y … select itemno from item1 where parentitem in

(select itemno from item1 where parentitem in


101 108 SubPart3
2 N … (select itemno from item1 where parentitem=100)));
101 200 SubPart4
2 N …
101 203 SubPart5
2 N … 69 rows selected.
101 204 SubPart6
2 N … Elapsed: 00:00:00.31
108 109 SubPart7
3 Y …
108 110 SubPart8
3 Y … VI. CONCLUSIONS AND FUTURE WORK
108 111 SubPart9 Although each new RDBMS release usually brings
3 Y … enhanced performance and improved access options that
108 112 SubPart10 may reduce the need for denormalization, there will be
3 Y … many occasion where even these popular RDBMSs will
204 209 SubPart11 require denormalized data structures. Denormalizatio will
3 Y … continue to remain an integral part of DataBase Design. A
……………. detailed authorization and access matrix which is stored
…………… along with the Denormalization view will further enhance
A Solution to the above query ―Find all items that are performance. This and a detailed strategy for cost benefit
contained in item 100‖ can now be written as: analysis will be the next stage in the subject of my research.
Select itemno from dn_item_hierarchy where
parentitemno=100; REFERENCES
To study the performance improvement using
denormalization, the normalized item table was created with [1] Afrati F., Chen Li, and Ullman J D. ―Generating
100 tuples, 70 tuples had the main root level as 100.The efficient plans using views‖. In SIGMOD, pages 319–330,
maximum child level nodes was 4. 2001.
[2] Amghar Y. and Mezaine M., ―Active database design‖
,Comad 97, Chennai, India.
[3] Chaudhuri, Krishnamurthy R, Potamianos S, and Shim
K,‖Optimizing Queries using materialized views‖, In
Proceedings of the 11th International Conference on Data
Engineering (Taipei, Taiwan, Mar.), ,1995,pp. 190--200.
Global Journal of Computer Science and Technology P a g e | 52

[4] Chirkova R., Chen Li, and J Li, ―Answering queries [7] Hauns M., ―To normalize or denormalize, that is the
using materialized views with minimum size‖ ,. Vldb question‖, Proceedings of 19th Int.Conf for Management
Journal 2006, 15 (3), pp. 191-210. and Performance Evaluation of Enterprise computing
[5] Date C.J, ―The Normal is so …interesting‖, DataBase Systems, San Diego,CA,1994,pp 416-423
programming and Design, Nov 1997,pp 23-25
[6] Halevy A. ―Answering queries using views: A survey.‖
In VLDB, 2001.
[8] Inmon W.H, ―Denormalization for Efficiency, [17] Date C.J. ,Kannan A., Swamynathan S.,‖An
―ComputerWorld‖, Vol 21 ,1987 pp 19-21 Introduction to Database Systems ―, ,8th Ed.,Pearson
[9] Ross K., Srivastava D. and Sudarshan S., ‖Materialized Education
View Maintenace and integrity constraint checking : trading [18] Elmashree R. and Navathe S.,―Fundamentals of
space for time‖, ACM Sigmod Conference 1996,pp 447 -458 Database Systems‖,3rd Ed, Addison Weisley.
[10] Rodgers U., ‖Denormalization: why, what and how?‖ [19] Davidson L., ―Ten common design mistakes ―,
Database Programming and Design,1989 (12) ,pp 46-53 software engineers blog, Feb 2007
[11] Sanders G. and Shin S.K, ―Denormalization Effects on [20] Downs K.,‖The argument for Denormalization‖,The
Performance of RDBMS‖, Proceedings of the 34 th Database Programmer,Oct 2008
International Conference on Systems Sciences, 2001 [21] Khaldtiance S., ―Evaluate Index Usage in Databases‖,
[12] Schkolnick M., Sorenson P. , ―Denormalization :A SQL Server Magazine, October 2008
performance Oriented database design technique‖ , [22] Lurie M.,IBM, ‖Winning Database Configurations
Proceedings of the AICA 1980 Congress ,Italy. [23] Mullins C, ―Denormalization Guidelines ―, Platinum
[13] Shin S.K and Sanders G.L., ― Denormalization Tecnology Inc.,Data administration Newsletter, Accessed
strategies for data retrieval from data warehouses June 2009.
―,Decision support Systems, VolVol. 42, No. 1, pp. 267-282, [24] Microsoft - SQL Server 7.0 Resource Guide ‖Chapter
2006 12 - Data Warehousing Framework‖
[14] Vincent M., Mohania M. and Kambayashi Y., ―A Self- [25] Roten-Gal-Oz A. Cirrus minor in ―Making IT work‖
Maintainable View maintenance technique for data Musings of an Holistic Architect, Accessed June 2009
warehouses‖ ,8th Int. Conf on Management of Data, [26] Van Couver D. on his blog ―Van Couvering is not a
Chennai,India verb‖, Accessed June 2009
[15] Wei Z., Dejun J., Pierre G.,Chi C.H, Steen [27] Wiseth K, Editor-in-Chief of Oracle Technology News,
M.,‖Service-Oriented Data Denormalization for Scalable in ‖Find Meaning‖,Accessed June 2009
Web Applications‖ , Proceedings of the 17 th International [28] Thought Clusters on software, development and
WWW Conference 2008, Beijing, China programming, website -– March 2009
[16] Barquin R., Edelstein H., ―Planning and Designing the [29] website – http://www.sqlteam.com/Forums/, Accessed
Data Warehouse‖, Prentice Hall July 2009

You might also like