0% found this document useful (0 votes)
27 views14 pages

I.M. 1 3 Normalization

- Data are raw facts that must be processed to reveal meaning and inform decision making. Information is the output of processing data. - Databases evolved from file systems to address limitations like redundant data, complex administration, and data integrity issues. - A database management system (DBMS) implements and manages database contents. It presents data as a single repository to promote sharing and eliminate data islands. - A data model abstractly represents a real-world environment using entities, attributes, and relationships. It facilitates interaction between designers, programmers, and users who have different data views. Data modeling is iterative. - Basic components of data modeling include entities (anything about which data is collected), attributes (entity characteristics), and relationships
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views14 pages

I.M. 1 3 Normalization

- Data are raw facts that must be processed to reveal meaning and inform decision making. Information is the output of processing data. - Databases evolved from file systems to address limitations like redundant data, complex administration, and data integrity issues. - A database management system (DBMS) implements and manages database contents. It presents data as a single repository to promote sharing and eliminate data islands. - A data model abstractly represents a real-world environment using entities, attributes, and relationships. It facilitates interaction between designers, programmers, and users who have different data views. Data modeling is iterative. - Basic components of data modeling include entities (anything about which data is collected), attributes (entity characteristics), and relationships
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

01: Database Sytems - Manages structure and controls access to data

Why Databases? Role and Advantages of the DBMS


- solve many of the problems encountered in data management Role of a DBMS
- used in almost all modern settings involving data management:
• DBMS is the intermediary between the user and the database
 Business – Database structure stored as file collection
 Research – Can only access files through the DBMS
 Administration • DBMS enables data to be shared
- important to understand how databases work and interact with other • DBMS integrates many users’ views of the data
applications

Data Advantages of a DBMS:


 Improved data sharing
- are raw facts  Improved data security
- building blocks of information  Better data integration
- foundation of information, which is the bedrock of knowledge  Minimized data inconsistency
- Raw data must be formatted for storage, processing, and presentation  Improved data access
Information  Improved decision making
 Increased end-user productivity
- is the result of processing raw data to reveal meaning
- requires context to reveal meaning Classification of Databases
- produced by processing data Databases can be classified according to:
- used to reveal meaning in data  Number of users
 Database location(s)
 Expected type and extent of use
 Accurate, relevant, timely information is the key to good decision making.
 Good decision making is the key to organizational survival. 1. Single-user database
- supports only one user at a time
Database - Desktop database: single-user; runs on PC
- shared, integrated computer structure that stores a collection of: 2. Multiuser database
- supports multiple users at the same time
 End-user data: raw facts of interest to end user
- Workgroup and enterprise databases
 Metadata: data about data
 Provides description of data characteristics and Why Database Design Is Important
relationships in data
Database design
 Complements and expands value of data
- focuses on design of database structure used for end-user data
Database management system (DBMS): - designer must identify database’s expected use
- collection of programs Well-designed database:
- Facilitates data management – Relationships
- Generates accurate and valuable information – Constraints
• Business rules identify and define basic modeling components
Poorly designed database: • Hierarchical model
- Causes difficult-to-trace errors – Set of one-to-many (1:M) relationships between a parent and its
children segments
Summary • Network data model
– Uses sets to represent 1:M relationships between record types
• Data are raw facts • Relational model
• Information is the result of processing data to reveal its meaning – Current database implementation standard
• Accurate, relevant, and timely information is the key to good decision – ER model is a tool for data modeling
making • Complements relational model
• Data are usually stored in a database • Object-oriented data model: object is basic modeling structure
• DBMS implements a database and manages its contents. • Relational model adopted object-oriented extensions: extended relational
• Metadata is data about data data model (ERDM)
• Database design defines the database structure • OO data models depicted using UML
• Well-designed database facilitates data management and generates • Data-modeling requirements are a function of different data views and
valuable information abstraction levels
• Poorly designed database leads to bad decision making and organizational – Three abstraction levels: external, conceptual, internal
failure
• Databases evolved from manual and computerized file systems
• In a file system, data stored in independent files 02: Data Models
• Each requires its own management program
• Some limitations of file system data management: • Data models
• Requires extensive programming – Relatively simple representations of complex real-world data
• System administration is complex and difficult structures
• Changing existing structures is difficult • Often graphical
• Security features are likely inadequate • Model: an abstraction of a real-world object or event
• Independent files tend to contain redundant data – Useful in understanding complexities of the real-world
• Structural and data dependency problems environment
• Database management systems were developed to address file system’s • Data modeling is iterative and progressive
inherent weaknesses
• DBMS present database to end user as single repository The Importance of Data Models
• Promotes data sharing
• Facilitate interaction among the designer, the applications programmer,
• Eliminates islands of information
and the end user
• DBMS enforces data integrity, eliminates redundancy, and promotes
• End users have different views and needs for data
security
• Data model organizes data for various users
• A data model is an abstraction of a complex real-world data environment
• Data model is an abstraction
• Basic data modeling components:
– Cannot draw required data out of the data model
– Entities
– Attributes
Data Model Basic Building Blocks • Generally, nouns translate into entities
• Entity: anything about which data are to be collected and stored • Verbs translate into relationships among entities
• Attribute: a characteristic of an entity • Relationships are bidirectional
• Relationship: describes an association among entities • Two questions to identify the relationship type:
– One-to-many (1:M) relationship – How many instances of B are related to one instance of A?
– Many-to-many (M:N or M:M) relationship
– How many instances of A are related to one instance of B?
– One-to-one (1:1) relationship
• Constraint: a restriction placed on the data
Naming Conventions
• Naming occurs during translation of business rules to data model
Business Rules
components
• Descriptions of policies, procedures, or principles within a specific
• Names should make the object unique and distinguishable from other
organization
objects
– Apply to any organization that stores and uses data to generate
• Names should also be descriptive of objects in the environment and be
information
familiar to users
• Description of operations to create/enforce actions within an organization’s
• Proper naming:
environment
– Facilitates communication between parties
– Must be in writing and kept up to date
– Promotes self-documentation
– Must be easy to understand and widely disseminated
• Describe characteristics of data as viewed by the company

Discovering Business Rules


• Sources of business rules:
– Company managers
– Policy makers
– Department managers
– Written documentation
• Procedures
• Standards
• Operations manuals
– Direct interviews with end users
• Standardize company’s view of data
• Communications tool between users and designers
• Allow designer to understand the nature, role, and scope of data
• Allow designer to understand business processes
• Allow designer to develop appropriate relationship participation rules and
constraints

Translating Business Rules into Data Model Components The Hierarchical Model
• The hierarchical model was developed in the 1960s to manage large
amounts of data for manufacturing projects The Relational Model
• Basic logical structure is represented by an upside-down “tree” • Developed by E.F. Codd (IBM) in 1970
• Hierarchical structure contains levels or segments • Table (relations)
– Segment analogous to a record type – Matrix consisting of row/column intersections
– Set of one-to-many relationships between segments – Each row in a relation is called a tuple
• Relational models were considered impractical in 1970
The Network Model • Model was conceptually simple at expense of computer overhead
• The network model was created to represent complex data relationships • Relational data management system (RDBMS)
more effectively than the hierarchical model • Performs same functions provided by hierarchical model
– Improves database performance • Hides complexity from the user
– Imposes a database standard • Relational diagram
• Resembles hierarchical model • Representation of entities, attributes, and relationships
– However, record may have more than one parent • Relational table stores collection of related entities
• Collection of records in 1:M relationships
• Set composed of two record types:
• Owner
• Equivalent to the hierarchical model’s parent
• Member
• Equivalent to the hierarchical model’s child
• Concepts still used today:
• Schema
• Conceptual organization of entire database as
viewed by the database administrator
• Subschema
• Database portion “seen” by the application
programs
• Concepts still used today: (cont’d.)
• Data management language (DML)
• Defines the environment in which data can be
managed
• Data definition language (DDL)
• Enables the administrator to define the schema
components
• Disadvantages of the network model:
• Cumbersome
• Lack of ad hoc query capability placed burden on programmers to generate
code for reports
• Structural change in the database could produce havoc in all application
programs
• Crow’s Foot notation used as design standard in this book

• SQL-based relational database application involves three parts:


– User interface
• Allows end user to interact with the data
– Set of tables stored in the database
• Each table is independent from another The Object-Oriented (OO) Model
• Rows in different tables are related based on common
values in common attributes • Data and relationships are contained in a single structure known as an
– SQL “engine” object
• Executes all queries • OODM (object-oriented data model) is the basis for OODBMS
The Entity Relationship Model – Semantic data model
• An object:
• Widely accepted standard for data modeling – Contains operations
• Introduced by Chen in 1976 – Are self-contained: a basic building-block for autonomous
• Graphical representation of entities and their relationships in a database structures
structure – Is an abstraction of a real-world entity
• Entity relationship diagram (ERD) • Attributes describe the properties of an object
– Uses graphic representations to model database components • Objects that share similar characteristics are grouped in classes
– Entity is mapped to a relational table • Classes are organized in a class hierarchy
• Entity instance (or occurrence) is row in table • Inheritance: object inherits methods and attributes of parent class
• Entity set is collection of like entities • UML based on OO concepts that describe diagrams and symbols
• Connectivity labels types of relationships • Used to graphically model a system
• Relationships are expressed using Chen notation
• Relationships are represented by a diamond
• Relationship name is written inside the diamond
– Companies operate on a “pay-as-you-go” system

Newer Data Models: Object/Relational and XML


• Extended relational data model (ERDM)
– Semantic data model developed in response to increasing
complexity of applications Data Models: A Summary
– Includes many of OO model’s best features
• Common characteristics:
– Often described as an object/relational database management
– Conceptual simplicity with semantic completeness
system (O/RDBMS)
– Represent the real world as closely as possible
– Primarily geared to business applications
– Real-world transformations must comply with consistency and
• The Internet revolution created the potential to exchange critical business
integrity characteristics
information
• Each new data model capitalized on the shortcomings of previous models
• In this environment, Extensible Markup Language (XML) emerged as the de
• Some models better suited for some tasks
facto standard
• Database designer starts with abstracted view, then adds details
• Current databases support XML
• XML: the standard protocol for data exchange among systems and Internet
services

The Future of Data Models


• Hybrid DBMSs
– Retain advantages of relational model
– Provide object-oriented view of the underlying data
• SQL data services
– Store data remotely without incurring expensive hardware,
software, and personnel costs
• ANSI Standards Planning and Requirements Committee (SPARC) • ERD graphically represents the conceptual schema
– Defined a framework for data modeling based on degrees of data
abstraction (1970s):
• External
• Conceptual
• Internal
• End users’ view of the data environment
• ER diagrams represent external views
• External schema: specific representation of an external view
– Entities
– Relationships
– Processes
– Constraints
• Easy to identify specific data required to support each business unit’s
operations
• Facilitates designer’s job by providing feedback about the model’s
adequacy
• Ensures security constraints in database design
• Simplifies application program development
• Provides a relatively easily understood macro level view of data
environment
• Independent of both software and hardware
– Does not depend on the DBMS software used to implement the
model
– Does not depend on the hardware used in the implementation of
the model
– Changes in hardware or software do not affect database design at
the conceptual level

The Internal Model


• Representation of the database as “seen” by the DBMS
– Maps the conceptual model to the DBMS
• Internal schema depicts a specific representation of an internal model
• Depends on specific database software
– Change in DBMS software requires internal model be changed
• Logical independence: change internal model without affecting conceptual
The Conceptual Model model
• Represents global view of the entire database
• All external views integrated into single global view: conceptual schema
• ER model most widely used
• Basic data modeling components:
– Entities
– Attributes
– Relationships
– Constraints
• Business rules identify and define basic modeling components
• Hierarchical model
• Set of one-to-many (1:M) relationships between a parent and its children
segments
• Network data model
• Uses sets to represent 1:M relationships between record types
• Relational model
• Current database implementation standard
• ER model is a tool for data modeling
The Physical Model • Complements relational model
• Object-oriented data model: object is basic modeling structure
• Operates at lowest level of abstraction • Relational model adopted object-oriented extensions: extended relational
– Describes the way data are saved on storage media such as disks data model (ERDM)
or tapes • OO data models depicted using UML
• Requires the definition of physical storage and data access methods • Data-modeling requirements are a function of different data views and
• Relational model aimed at logical level abstraction levels
– Does not require physical-level details • Three abstraction levels: external, conceptual, internal
• Physical independence: changes in physical model do not affect internal
03: The Relational Database Model
model
A Logical View of Data
• Relational model
– View data logically rather than physically
• Table
– Structural and data independence
– Resembles a file conceptually
Summary • Relational database model is easier to understand than hierarchical and
• A data model is an abstraction of a complex real-world data environment network models
Tables and Their Characteristics
• Logical view of relational database is based on relation
– Relation thought of as a table
• Table: two-dimensional structure composed of rows and columns
– Persistent representation of logical relation
– Contains group of related entities (entity set).
Keys
• Each row in a table must be uniquely identifiable – Redundancy exists only when there is unnecessary duplication of
• Key is one or more attributes that determine other attributes attribute values
• Key’s role is based on determination
– If you know the value of attribute A, you can determine the value
of attribute B
• Functional dependence
– Attribute B is functionally dependent on A if all rows in table that
agree in value for A also agree in value for B
• Composite key
– Composed of more than one attribute
• Key attribute
– Any attribute that is part of a key
• Superkey
– Any key that uniquely identifies each row
• Candidate key • Foreign key (FK)
– A superkey without unnecessary attributes – An attribute whose values match primary key values in the related
• Nulls table
– No data entry • Referential integrity
– Not permitted in primary key – FK contains a value that refers to an existing valid tuple (row) in
– Should be avoided in other attributes another relation
– Can represent: • Secondary key
– An unknown attribute value – Key used strictly for data retrieval purposes
– A known, but missing, attribute value
– A “not applicable” condition Integrity Rules
– Can create problems when functions such as COUNT, AVERAGE, • Many RDBMs enforce integrity rules automatically
and SUM are used • Safer to ensure that application design conforms to entity and referential
– Can create logical problems when relational tables are linked integrity rules
• Controlled redundancy • Designers use flags to avoid nulls
– Makes the relational database work
– Tables within the database share common attributes
• Enables tables to be linked together
– Multiple occurrences of values not redundant when required to
make the relationship work
– Flags indicate absence of some value

Relational Set Operators


• Relational algebra
– Defines theoretical way of manipulating table contents using
relational operators
– Use of relational algebra operators on existing relations produces
new relations:
• Natural Join

– Links tables by selecting rows with common values in common


attribute(s)
• Equijoin
– Links tables on the basis of an equality condition that compares
specified columns
• Theta join
– Any other comparison operator is used
• Outer join
– Matched pairs are retained, and any unmatched values in other
table are left null

The Data Dictionary and System Catalog


• Data dictionary
– Provides detailed accounting of all tables found within the
user/designer-created database
– Contains (at least) all the attribute names and characteristics for
each table in the system
– Contains metadata: data about data
• System catalog
– Contains metadata
– Detailed system data dictionary that describes all objects within
the database

Relationships within the Relational Database


• 1:M relationship
– Relational modeling ideal
– Should be the norm in any relational database design
• 1:1 relationship
– Should be rare in any relational database design – Index’s reference point
• M:N relationships – Points to data location identified by the key
– Cannot be implemented as such in the relational model • Unique index
– M:N relationships can be changed into 1:M relationships – Index in which the index key can have only one pointer value (row)
associated with it
The 1:M Relationship • Each index is associated with only one table
• Relational database norm Codd’s Relational Database Rules
• Found in any database environment • In 1985, Codd published a list of 12 rules to define a relational database
The 1:1 Relationship system
• One entity related to only one other entity, and vice versa – Products marketed as “relational” that did not meet minimum
• Sometimes means that entity components were not defined properly relational standards
• Could indicate that two entities actually belong in the same table • Even dominant database vendors do not fully support all 12 rules
• Certain conditions absolutely require their use
The M:N Relationship Summary
• Implemented by breaking it up to produce a set of 1:M relationships
• Tables are basic building blocks of a relational database
• Avoid problems inherent to M:N relationship by creating a composite entity
• Keys are central to the use of relational tables
– Includes as foreign keys the primary keys of tables to be linked
• Keys define functional dependencies
– Superkey
– Candidate key
– Primary key
– Secondary key
– Foreign key
• Each table row must have a primary key that uniquely identifies all
attributes
• Tables are linked by common attributes
• The relational model supports relational algebra functions
• SELECT, PROJECT, JOIN, INTERSECT UNION, DIFFERENCE, PRODUCT, DIVIDE
• Good design begins by identifying entities, attributes, and relationships
• 1:1, 1:M, M:N
Data Redundancy Revisited
• Data redundancy leads to data anomalies
04: Normalization of Database Tables
– Can destroy the effectiveness of the database • Normalization
• Foreign keys – Process for evaluating and correcting table structures to minimize
– Control data redundancies by using common attributes shared by data redundancies
tables • Reduces data anomalies
– Crucial to exercising data redundancy control – Series of stages called normal forms:
• Sometimes, data redundancy is necessary • First normal form (1NF)
• Second normal form (2NF)
Indexes
• Third normal form (3NF)
• Orderly arrangement to logically access rows in a table
– 2NF is better than 1NF; 3NF is better than 2NF
• Index key
– For most business database design purposes, 3NF is as high as • Step 1: Eliminate the Repeating Groups
needed in normalization – Eliminate nulls: each repeating group attribute contains an appropriate
– Highest level of normalization is not always most desirable data value
• Step 2: Identify the Primary Key
• Denormalization produces a lower normal form
– Must uniquely identify attribute value
– Increased performance but greater data redundancy
– New key must be composed
• Step 3: Identify All Dependencies
– Dependencies are depicted with a diagram

Conversion to Second Normal Form


• Step 1: Make New Tables to Eliminate Partial Dependencies
Conversion to First Normal Form – Write each key component on separate line, then write original
(composite) key on last line
– Each component will become key in new table
• Step 2: Assign Corresponding Dependent Attributes
– Determine attributes that are dependent on other attributes
– At this point, most anomalies have been eliminated
• Table is in second normal form (2NF) when:
– It is in 1NF and
– It includes no partial dependencies:
• No attribute is dependent on only portion of primary key

Conversion to Third Normal Form


• Step 1: Make New Tables to Eliminate Transitive Dependencies
– For every transitive dependency, write its determinant as PK for new table
– Determinant: any attribute whose value determines other values within a
row
• Step 2: Reassign Corresponding Dependent Attributes
– Identify attributes dependent on each determinant identified in Step 1
• Identify dependency
– Name table to reflect its contents and function

You might also like