DIT 0201-HCT0104-Database Notes
DIT 0201-HCT0104-Database Notes
DATABASE
SYSTEMS/
HCT0104:
DATABASE
MANAGEMENT
SYSTEMS
1DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
DEFINITIONS:
1. Data: raw facts about an entity
2. Information: processed data.
3. Data definition: Creation, modification and removal of definitions that define the
organization of the data.
4. Data modelling: is a process used to define and analyze data requirements needed to
support the business processes within the scope of corresponding information systems
in organizations.
5. Database: is an organized collection of data. Or a collection of logical data. It is a
collection of related data organized in a way that data can be easily accessed,
managed, and updated.
6. Database management systems (DBMS): It is software that allows the creation,
definition, and manipulation of databases.
Or Software that allow users to define, create and manages database access, e.g.
MySQL, Oracle etc.
7. Database schema: it is a descriptive detail of the database, which can be depicted by
means of schema diagrams. It defines entities and relationships among them.
FUNCTIONS OF A DBMS
i. Perform any kind of operation on data in a database.
ii. Provide protection and security to database.
iii. Concurrency control: maintains data consistency in case of multiple users.
iv. Provide utility services.
v. Provide data independence.
i. Hardware: the actual computer system used for keeping and accessing the database. ii.
Software: the actual DBMS. It is a mediator between the database and the users. iii. Data:
raw facts about an entity.
1
2DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
iv. Procedures: instructions and rules that govern the design and use of the database.
v. Users: these may be of many various types including DB administrator, System developer
and end users.
TYPES OF USERS
i. Naive Users: those users who need not be aware of the presence of the database system
or any other system supporting their usage. They are end users of the database who
work through a menu driven application program, where the type and range of
response is always indicated to the user.
ii. Online Users: those who may communicate with the database directly via an online
terminal or indirectly via a user interface and application program. These users are
aware of the presence of the database system and may have acquired a certain amount
of expertise with in the limited interaction permitted with a database.
iii. Sophisticated Users: Such users interact with the system without, writing programs.
Instead, they form their requests in database query language. Each such query is
submitted to a very processor whose function is to break down DML statements into
instructions that the storage manager understands.
iv. Specialized Users: Such users are those who write specialized database application
that do not fit into the fractional data-processing framework. For example: Computer
aided design systems, knowledge base and expert system, systems that store data with
complex data types.
v. Application Programmers: Professional programmers are those who are responsible
for developing application programs or user interface. The application programs could
be written using general purpose programming language or the commands available to
manipulate a database.
vi. Database Administrator: The database administrator (DBA) is the person or group in
charge for implementing the database system, within an organization. The "DBA has
all the system privileges allowed by the DBMS and can assign (grant) and remove
(revoke) levels of access (privileges) to and from other users. DBA is also responsible
for the evaluation, selection and implementation of DBMS package.
ADVANTAGES OF DBMS
1. Improved data sharing: The DBMS helps create an environment in which end users
have better access to more and better-managed data.
2. Improved data security: The more the users access the data, the greater the risks of
data security breaches. This is the reason DBMS provides a framework for better
enforcement of data privacy and security policies.
3. Better data integration: It is much easier to see how actions in one segment of the
company affect other segments.
4. Minimized data inconsistency: Data inconsistency exists when different versions of
the same data appear in different places. The probability of data inconsistency is
greatly reduced in a properly designed database.
2
3DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
5. Improved data access: The DBMS makes it possible to produce quick answers to ad
hoc queries. From a database perspective, a query is a specific request issued to the
DBMS for data manipulation.
6. Improved decision making: Better-managed data and improved data access make it
possible to generate better-quality information, on which better decisions are based.
Data quality is a comprehensive approach to promoting the accuracy, validity, and
timeliness of the data. While the DBMS does not guarantee data quality, it provides a
framework to facilitate data quality initiatives.
7. Increased end-user productivity: The availability of data, combined with the tools
that transform data into usable information, empowers end users to make quick,
informed decisions that can make the difference between success and failure in the
global economy.
DISADVANTAGES OF DBMS
1. Complexity: The provision of the functionality that is expected of a good DBMS
makes the DBMS an extremely complex piece of software. Failure to understand the
system can lead to bad design decisions, which can have serious consequences for an
organization.
2. Size: The complexity and breadth of functionality makes the DBMS an extremely large
piece of software, occupying many megabytes of disk space and requiring substantial
amounts of memory to run efficiently.
3. Performance: The DBMS file based system is written to be more general, to cater for
many applications rather than just one. The effect is that some applications may not
run as fast as they used to.
4. Higher impact of a failure: The centralization of resources increases the vulnerability
of the system. Since all users and applications rely on the ~vailabi1ity of the DBMS,
the failure of any component can bring operations to a halt.
5. Cost of DBMS: The cost of DBMS varies significantly, depending on the environment
and functionality provided. There is also the recurrent annual maintenance cost. 6.
Additional Hardware costs: To achieve the required performance it may be necessary to
purchase a larger machine, perhaps even a machine dedicated to running the DBMS. The
procurement of additional hardware results in further expenditure. 7. Cost of
Conversion: In some situations, the cost of DBMS and extra hardware may be
insignificant compared with the cost of converting existing applications to run on the new
DBMS and hardware. This cost is one of the main reasons why some organizations feel
tied to their current systems and cannot switch to modern database technology.
The software responsible for the management data in computers i.e. DBMS (like Oracle,
Foxpro, SQL Server etc.) should meet the following requirements:
3
4DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
i. Provide data definition facilities: It should support Data Definition Language (DDL)
and provides user accessible catalogue Known as Data Dictionary.
ii. Provide facilities for storing, retrieving and updating data: It should support Data
Manipulation Language (DML), so that required data can be inserted, updated,
deleted and retrieved.
iii. Supports multiple views of data: The end user should have the facility of flexible query
language so that required information can be accessed easily.
iv. Provides facilities for specifying Integrity constraints: It should support the
constraints like Primary key, foreign key during creation of tables so that only the
valid information is stored in the database. As soon as, we try to insert any incorrect
information it should display the error message.
v. Provide security of data: It should have the facilities for controlling access to data and
prevent unauthorized access and update.
vi. Provide concurrency control mechanism: It should allow simultaneous access and
update of data by multiple users
vii. Support Transactions: It should support all the properties of transaction known as
ACID properties. It means a sequence of operations to be performed as a whole. In
other words all operations are performed or none.
viii. Provide facilities for database recovery: It should bring database back to consistent
state after a failure such as disk failure, faulty program etc.
ix. Provide facilities for database maintenance: It should support maintenance operations
like unload, reload, mass insertion, deletion and validation of data.
4
5DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
DATABASE MODELS
It is a type of data model that determines the logical structure of a database and
fundamentally determines in which manner data can be stored, organized, and manipulated.
They define the logical design of data.
Data model tells how the logical structure of a database is modelled. They define how data is
connected to each other and how it will be processed and stored inside the system.
2. Hierarchical Model
This database model organises data into a tree-like-structure, with a single root, to which all
the other data is linked. The hierarchy starts from the Root data, and expands like a tree,
adding child nodes to the parent nodes.
In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book, recipes
etc.
In hierarchical model, data is organised into tree-like structure with one one-to-many
relationship between two different types of data, for example, one department can have many
courses, many professors and many students.
5
6DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
i. It is easy to design
ii. It is cheap to maintain
iii. It is easy to use
iv. Data sharing and security can be enforced since all data are stored in a common database.
v. A certain degree of data independence can be maintained
i. It is inflexible to information.
ii. Relationships are difficult to implement.
iii. Extensive programming activities required. Navigation inside the tree is complicated.
iv. Difficult to solve the problem of single child with multiple parents
v. Difficult to navigate through: with the exception of the root record, all records have to be
accessed through the parent.
vi. Alteration of data is difficult due to rules governing the relationship of records.
3. Network Model
The network model is a bit like an extended hierarchical model. But instead of talking about
parents & children we talk about owners and members.
However, it is much more flexible, in that members can have many owners.
Pointers link records in the set. It is possible to navigate in all directions.
Some data can be modelled with more than one parent per child. The network model permits
the modelling of many-to-many relationships in data. Since the data is more related,
accessing the data is easier and fast.
The data model is a simple network, and link and intersection record types may exist, as well
as sets between them.
6
7DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
4. Relational Model
In this model, data is organised into a collection of two-dimensional tables and the
relationship is maintained by storing a common field.
7
8DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
A table is a collection of records (rows) and each record in a table contains the same fields
(columns).
A relational database allows the definition of data structures, storage and retrieval operations
and integrity constraints.
Any database file can be a component of more than one of the database’s tables in a database
system. It is a database system in which the database is organized and accessed according to
the relationships between data items without the need for any consideration of physical
orientation and relationship. Relationships between data items are expressed by means of
tables.
A database maintains a set of separate, related files (tables), but combines data elements from
the files for queries and reports when required.
Terminologies
i. Entity instance: represents an object of interest. For example, a student record is an
entity instance that represents an individual student.
ii. Entity class: a collection of entity instances with a common structure. For example, a
whole collection of student records forms an entity class.
iii. Attribute: a piece of interesting information, about the instances of an entity class. For
example, 'name' is a fact about students that is represented as an attribute of all the
instances of the class Student.
In a relational data model, the identity of an entity instance is defined by the values of
its attributes; no two entity instances can have the same attribute values.
iv. Relationship: an association between entities. Entities are often identified by nouns in a
requirements specification, and relationships by verbs. For example, 'takes' indicates a
relationship between Student and Module. Relationships can be between entity
classes, or between entity instances.
8
9DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
Publisher
PubID
Publisher AuthorBday
PubAdd
Fig4: Relational model
Book
Author ISBN
AuthorID PubID
AuthorName AuthorID Date
Title
i. Better security. By splitting data into tables, certain tables can be made confidential. ii.
Cater for future requirements: By having data held in separate tables, it is simple to add
records that are not yet needed but may be in the future.
iii. Ease of use: The revision of any information as tables consisting of rows and columns is
much easier to understand.
iv. Flexibility: Different tables from which information has to be linked and extracted can
be easily manipulated by operators such as project and join to give information in the
form in which it is desired.
9
10 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
iv. Developer Expertise: As the complexity of a relational database increases, the skill set
required by the RDBMS administrator, various users and report developers also
increases.
v. Hardware Performance: Complex queries require sophisticated processing power. A
database with external data sources or very complex data structures may require more
powerful servers to return results within an acceptable response time.
An object oriented data model (OODM) is a logical data model that captures the meaning of
objects of the kind used in object oriented programming.
It is a model in which information is represented in the form of objects as used in object
oriented programming.
This model defines a database as a collection of objects, or reusable software elements, with
associated features and methods. There are several kinds of object-oriented databases. i.
Multimedia database incorporates media, such as images, that could not be stored in a
relational database.
ii. Hypertext database allows any object to link to any other object. It is useful for
organizing lots of disparate data, but it’s not ideal for numerical analysis.
An object-oriented data model is one of the most developed data models which contains
video, graphical files, and audio. This consists of the data piece and the methods in the form
of database management system instructions.
Object DBMSs add database functionality to object programming languages (such as Delphi,
Ruby, Python, JavaScript, Perl, Java, C#, Visual Basic .NET, C++). A major benefit of this
approach is the unification of the application and database development into a seamless data
model and language environment.
As a result, applications require less code, use more natural data modelling, and code bases
are easier to maintain. Object developers can write complete database applications with a
modest amount of additional effort.
This makes object DBMSs better suited to support applications such as financial portfolio
risk analysis systems, telecommunications service applications, World Wide Web document
structures, design and manufacturing systems, and hospital patient record systems, which
have complex relationships between data.
10
11 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
iii. Class: objects with similar characteristics are grouped into classes. Therefore, a class is a
collection of similar objects with shared attributes and behaviours (methods). It is a
template from which objects are created.
iv. Behaviours/methods: represents a real world action that the object can perform.
2. Inheritance
It is the ability of an object (the child class) to acquire some or all the characteristics of
the class above it (the base class). This means that attributes that are shared by different
kinds of entity can be represented once, using super-classes or generalization. Sub
classes, or specializations of a superclass can inherit these shared attributes. We have three
major types of inheritance; single inheritance, multiple inheritance and hierarchical
inheritance.
3. Polymorphism
It is the ability of an object to take many forms. It is a mechanism in which different
objects react differently to the same message.
4. Abstraction
It is the principle that helps the programmer focus on the essential inherent aspects of an
entity and ignore its accidental properties.
5. Data/information hinding
It refers to the restriction of external access to information or attributes.
Example:
Person
name: string
bday: integer
age(): integer
11
12 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
∙ Cost reduction. Because of its inheritance property, we can re-use the attributes and
functionalities. This reduces the cost of maintenance.
∙ Security. Encapsulation of information provides security and therefore there is no fear
being misused by other objects.
∙ Scalability: If we need any new feature we can easily add new class inherited from
parent class and adds new features
∙ Codes reusability. Codes are re-used because of inheritance.
∙It is more understandable. Since each class binds its attributes and its functionality, it is
same as representing the real world object. We can see each object as a real entity.
∙ It is not widely developed and complete to use it in the database systems. Hence it is not
accepted by the users.
∙ It is an approach for solving the requirement. It is not a technology. Hence it fails to put
it in the database management systems.
12
13 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
It describes how different entities (objects, items) are related to each other. It also describes
what attributes (features) each entity has.
The result of this phase is an Entity-Relationship (ER) diagram or UML class diagram.
Characteristics:
i. Includes the important entities and the relationships among them.
ii. No attribute is specified.
iii. No primary key is specified.
iv. Contains relationships between entities but may not include cardinality or nullability. v.
Entities will have definitions
vi. Designed and developed to be independent of DBMS, data storage locations or
technologies
13
14 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
2. Logical Design
It is a process of constructing a model of information, which can then be mapped into storage
objects supported by the DBMS. It describes the data in as much detail as possible, without
regard to how they will be physically implemented in the database.
It is a data model that is independent of DBMS, technology, data storage or organizational
constraints. It describes data requirements. It defines a database in a data model of a specific
DBMS.
The result of the logical design phase (or data model mapping phase) is a set of relation
schemas. The Entity Relationship diagram or class diagram is the basis for these relation
schemas. A relation schema (or relation scheme) is a set of attributes. A relation schema is
also known as table schema (or table scheme).
Characteristics:
i. Contains relationships between entities that address cardinality and nullability of the
relationships.
ii. It is designed and developed to be independent of DBMS, data storage locations and
technologies.
iii. Data attributes will typically have datatypes with precisions and lengths assigned
iv. Entities and attributes will have definitions.
v. Normalization is done.
vi. Primary keys and foreign keys are specified.
vii. All attributes for each entity are specified.
3. Physical Design
It specifies the physical configuration of the database on storage media. It refers to exactly
how the database will be implemented.
It shows all the table structures, including column name, column data type, column
constraints, primary key, foreign key, relationship between tables, security etc and the
database’s hardware and software specifications of the system.
The goal of this phase of database design is to implement the database. At this phase one
must know which database management system (DBMS) is used. For example, different
DBMS's have different names for datatypes and have different datatypes. The database is
created, tables are created, indexes, integrity constraints (rules) and the users' access rights
are defined. Finally, the data to test the database is added in.
In parallel with these activities, application programs are designed. The implementation of
the programs can start when the database is created and data has been added in.
Characteristics:
14
15 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
i. It contains relationships between tables that address cardinality and nullability of the
relationships.
ii. Columns will have datatypes with precisions and lengths assigned.
iii. Tables and columns will have definitions
iv. It includes other physical objects such as views, primary key, constraints, foreign key
constraints, indexes, security roles, store procedures etc
DATABASE SCHEMA
Database schema is the skeleton structure of database and it represents the logical view of
entire database. It tells about how the data is organized and how relation among them is
associated. It formulates all database constraints that would be put on data in relations, which
resides in database. It is designed when database does not exist at all and is very hard to do
any changes once the database is operational. It does not contain any data or information.
A database schema defines its entities and the relationship among them. It is a descriptive
detail of the database, which can be depicted by means of schema diagrams. A database
schema indicates which tables or relations make up the database, as well as the fields
included on each table. Thus, the terms schema diagram and entity-relationship diagram are
often interchangeable.
15
16 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
DATABASE INSTANCE
Database instance, is a state of operational database with data at any given time. This is a
snapshot of database. Database instances tend to change with time. DBMS ensures that its
every instance (state) must be a valid state by keeping up to all validation, constraints and
condition that database designers has imposed or it is expected from DBMS itself.
Any point in time, there is a continuous change to the records in the database objects. There
will be either increase or decrease in the number of records or there will be changes in the
existing data. Any particular point in time, there would be one particular set of records exists
in each of the objects, satisfying all the conditions of a database.
This is called an instance of a database. i.e.; at any particular point in time, what is the state of
database with data values in its object is called database instance. It changes from time to
time.
DATA INDEPENDENCE
A database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather
difficult to modify or update a set of metadata once it is stored in the database. But as a
DBMS expands, it needs to change over time to satisfy the requirements of the users. If the
entire data is dependent, it would become a tedious and highly complex job.
16
17 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
Data about data itself is divided in layered architecture so that when we change data at one
layer it does not affect the data layered at different level. This data is independent but mapped
on each other.
Logical data is data about database, that is, it stores information about how data is managed
inside. For example, a table (relation) stored in the database and all constraints, which are
applied on that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from actual data
stored on the disk. If we do some changes on table format it should not change the data
residing on disk.
All schemas are logical and actual data is stored in bit format on the disk. Physical data
independence is the power to change the physical data without impacting the schema or
logical data.
For example, in case we want to change or upgrade the storage system itself, that is, using
SSD instead of Hard-disks should not have any impact on logical data or schemas.
17
18 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
It is a graphical representation of entities and their relationships to each other. There are 3
main components of an ER diagram; entity, attributes and relationship. The process is
modelled as components (entities) that are linked with each other by relationships that
express the dependencies and requirements between them Every object like entity, attributes
of an entity, relationship set, and attributes of relationship set can be represented by tools of
ER diagram.
1. Entity
A real-world thing either animate or inanimate that can be easily identifiable and
distinguishable. For example, in a school database, student, teachers, class and course offered
can be considered as entities. All entities have some attributes or properties that give them
their identity.
An entity set is a collection of similar types of entities. Entity set may contain entities with
attribute sharing similar values. For example, Students set may contain all the student of a
school; likewise Teachers set may contain all the teachers of school from all faculties.
Entities sets need not to be disjoint.
Entities are represented by means of rectangles. Rectangles are named with the entity set they
represent.
Types of Entities
i. Strong entities exist independently from other entity types. They always possess one or
more attributes that uniquely distinguish each occurrence of the entity.
ii. Weak entities depend on some other entity type. They don't possess a primary key and
have no meaning in the diagram without depending on another entity. This other entity
is known as the owner.
iii. Associative entities are entities that associate the instances of one or more entity types.
They also contain attributes that are unique to the relationship between those entity
instances.
2. Attributes
Entities are represented by means of their properties, called attributes. Attributes are
represented by means of eclipses. Every eclipse represents one attribute and is directly
connected to its entity (rectangle). All attributes have values. For example, a student entity
may have name, class, age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of attributes:
i. Simple attribute: are atomic values, which cannot be divided further.
For example, student's phone-number is an atomic value of 10 digits.
ii. Composite attribute: are made of more than one simple attribute.
For example, a student's complete name may have first_name and last_name. iii.
Derived attribute: are attributes which do not exist physically in the database, but their
values are derived from other attributes presented in the database. For example,
average_salary in a department should be saved in database instead it can be derived. For
another example, age can be derived from date_of_birth.
iv. Single-valued attribute: contain on single value.
For example: Social_Security_Number.
v. Multi-valued attribute: may contain more than one values.
For example, a person can have more than one phone numbers, email_addresses etc.
If the attributes are composite, they are further divided in a tree like structure. Every node is
then connected to its attribute. That is composite attributes are represented by eclipses that
are connected with an eclipse. Multivalued attributes are depicted by double eclipse. Derived
attributes are depicted by dashed eclipse.
Attribute Derived
19
20 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
set. Keys are the attributes of the entity, which uniquely identifies the record of the entity.
Certain fields may be designated as keys.
For example STUDENT_ID identifies individual students.
i. Super Key: is the one or more attributes of the entity, which uniquely identifies the record
in the database. Also a set of attributes (one or more) that collectively identifies an
entity in an entity set.
ii. Candidate Key: it is any field, or combination of fields, that uniquely identifies a record.
The field/s of the candidate key must contain unique values, and cannot contain a null
value.
iii. Primary key: This is one of the candidate key chosen to uniquely identify the entity set.
Or chosen to identify unique records in a particular table. Though a person can be
identified using his ID#, or passport#, one can choose any one of them as primary key
to uniquely identify a person. Rest of them will act as a candidate key.
For example;
iv. Foreign key: is a field in the table that is primary key in another table. It can also be
referred to as the entity attribute in the entity, which is the primary key of the related
entity. Foreign key helps to establish the mapping between two or more entities. For
example;
20
21 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
3. Relationship
The association among entities is called relationship. For example, employee entity has
relation works_at with department. Another example is for student who enrolls in some
course. Here, Works_at and Enrolls are called relationship.
Relationships are represented by diamond shaped box. Name of the relationship is written in
the diamond-box. All entities (rectangles), participating in relationship, are connected to it by
a line. Weak relationships, or identifying relationships, are connections that exist between a
weak entity type and its owner.
Relationship Set:
Relationship of similar type is called relationship set. Like entities, a relationship too can
have attributes. These attributes are called descriptive attributes.
Relationship Weak
Relationship
21
22 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
Degrees of Relationships
In a relationship two or more number of entities can participate. The number of entities who
are part of a particular relationship is called degrees of relationship.
Cardinality of Relationships
Cardinality defines the number of entities in one entity set which can be associated to the
number of entities of other set via relationship set. It is the number of instance of an entity
from a relation that can be associated with the relation.
i. One-to-one (1:1): one entity from entity set A can be associated with at most one entity
of entity set B and vice versa.
For example; HOD of the Department. There is only one HOD in one department.
That is there is 1:1 relationship between the entity HOD and Department.
ii. One-to-many (1:M): One entity from entity set A can be associated with more than one
entities of entity set B but from entity set B one entity can be associated with at most
one entity from entity set A.
For example; One manager manages multiple employees in his department. Here
Manager and Employee are entities, and the relationship is one to many. Similarly,
one teacher teaches multiple classes is also a 1: M relationship.
iii. Many-to-many (M:N): one entity from entity set A can be associated with more than
one entity from entity set B and vice versa.
For example; Multiple Students enroll for multiple classes/courses makes this
relationship M:N.
Associations
Account
Bottom up
Is Saving Current
A Approach
Specialization: it is a top-down approach in which one higher level entity can be broken
down into two lower level entity. In specialization, some higher level entities may not have
lower-level entity sets at all.
Student
Top-down
approach
Is
A
Aggregation: it is a relationship where the child can exist independently of the parent.
Example: If you have class (parent) and student (child), when you delete the class, the
students still exists.
Composition: it is a relationship where the child cannot exist independent of the parent.
Example: If you have house (parent) and room (child), rooms cannot exist separate to a
house.
Example:
A school has a library. The library contains books. Each book has a publisher although
several books maybe published by the same publisher. The library has staff who manages the
books. Each book can be borrowed by a student. A student can borrow more than one book.
23
24 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
A book is identified by a book ID, staff by staff ID, publisher by publisher ID and a student
by student registration number.
Draw an E-R Diagram
Solution:
Step 1 : Identify the Entities
The entities are book, publisher, staff, student
diagram Staff_ID
Publish es
Pub_ID Stu_Reg_N
Staff Book s Borrow
o
Student
s
Book_ID
Since ER diagram gives us the good knowledge about the requirement and the mapping of
the entities in it, we can easily convert them as tables and columns. i.e.; using ER diagrams
one can easily created relational data model, which nothing but the logical view of the
database.
The basic rule for converting the ER diagrams into tables is:
i. Convert all the Entities in the diagram to tables.
24
25 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
ii. All single valued attributes of an entity is converted to a column of the table. All the
attributes, whose value at any instance of time is unique, are considered as columns of
that table.
iii. Key attribute in the ER diagram becomes the Primary key of the table.
iv. Declare the foreign key column, if applicable.
v. Any multi-valued attributes are converted into new table.
For example; if we have a hobby attribute in the Student table it would be a
multivalued attribute since any student can have any number of hobbies. So we cannot
represent multiple values in a single column of STUDENT table. We need to store it
separately, so that we can store any number of hobbies, adding/ removing / deleting
hobbies should not create any redundancy or anomalies in the system. Hence we
create a separate table STUD_HOBBY.
vi. Represent the relationships/cardinalities. (1:1, M:1, M:N)
25
26 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
NORMALIZATION
Normalization is the last part of the logical design. Normalization is a set of
rules/guidelines/technique that is used while designing a database. It is a technique by which
one can modify the relation schema to reduce the redundancy. Each normalization phase adds
more relations (tables) into the database.
There are two goals of the normalization process:
i. To eliminate redundancy and potential update anomalies.
Redundancy means that the same data is saved more than once in a database. Update
anomaly is a consequence of redundancy. If a piece of data is saved in more than one
place, the same data must be updated in more than one place.
ii. To ensuring data dependencies make sense (only storing related data in a table). Both of
these are worthy goals as they reduce the amount of space a database consumes and ensure
that data is logically stored.
Advantages of Normalization
1. Reduces data duplication: databases hold millions perhaps billions of information,
normalizing a database helps reduce its size and prevents data duplication by ensuring
that each piece is stored only once.
2. Increased storage efficiency.
3. Security: it gives a better handle at database security
4. Smaller database: by eliminating duplicate data, you will be able to reduce the overall
size of the database.
5. Better performance: Narrow tables: having more fine – tuned tables allows your tables
to have less columns and allows you to fit more records per data page. And fewer
indexes per table mean faster maintenance tasks such as index rebuilds.
Disadvantages of Normalization
1. Slows database performance: this is because there are many tables. By spreading out
your data into tables, you increase the need to join tables.
2. Requires detailed analysis and design: normalizing a database is a complex and
difficult task. Large databases require careful analysis and design before they are
normalised.
3. It is very time consuming and difficult process in normalizing relations of higher
degree.
4. Cost: its expensive to set- up, i.e. requires more CPU, memory and I/O to process thus
increasing the expenses.
5. Tables contain codes of real data: repeated data is sorted as codes rather that
meaningful data. Therefore, there is always a need to the lookup table for the value. 6.
Data model is difficult to query against: the data model is optimized for applications, not
for ad hoc querying.
26
27 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
Without Normalization, it becomes difficult to handle and update the database, without facing
data loss. Insertion, Updation and Deletion anomalies are also frequent.
Anomalies Possible:
i. Updation anomaly: if a record appears twice or more, then we have to update it twice or
more else our data becomes inconsistent.
ii. Insertion anomaly: suppose we are admitting a new student who does not have marks
for the unit yet, we will have to insert NULL, leading to insertion anomaly. iii. Deletion
anomaly: if a student drops a unit, we will have to delete the entire row, hence deleting the
entire student record along with it
RULES OF NORMALIZATION
There are rules of normalization:
i. First Normal Form (1NF)
ii. Second Normal Form (2NF)
iii. Third Normal Form (3NF)
iv. Boyce-Codd Normal Form (BCNF)
v. Fourth Normal Form (4NF)
vi. Fifth Normal Form (5NF)
Although there are discussions even on 6th Normal Form (proposed), in most practical
applications, normalization achieves its best in 3rd Normal Form.
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.
The first rule dictates that we must not duplicate data within the same row of a table. Do not
use multiple fields in a single table to store similar data. This concept is referred to as the
atomicity of a table. Tables that comply with this rule are said to be atomic.
It states that an attribute of a table cannot hold multiple values. It must hold only single valued
attribute. 1NF disallows the multi-valued attribute, composite attribute, and their
combinations.
27
28 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
These rules can be summarized in a simple statement: 2NF attempts to reduce the amount of
redundant data in a table by extracting it, placing it in new table(s) and creating relationships
between those tables.
Remove partial dependencies. It means that each column in the table that is not part of the
primary key must depend upon the entire key for its existence. If any column depends only
on one part of the key, then the table fails Second normal form.
NOTE: Partial Dependency exists, when for a composite primary key, any attribute in the
table depends only on a part of the primary key and not on the complete primary key. To
remove Partial dependency, we can divide the table, remove the attribute which is causing
partial dependency, and move it to some other table where it fits in well.
Example:
Let us consider following table which is in first normal form:
Employee Department No Employee Name Department
No
28
29 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
1 101 Amit OBIEE
In above example we can see that there is composite key as{ Employee No, Department
No}.Employee No is dependent on Employee Name and Department is dependent on
Department No. We can split the above table into 2 different tables:
1 101 Amit
2 102 Divya
3 101 Rama
101 OBIEE
102 COGNOS
Third Normal form applies that every non-prime attribute (attribute that does not occur in
any candidate key) of table must be dependent on primary key. The transitive functional
dependency should be removed from the table. The table must be in Second Normal form.
Functional dependency occurs when one attribute in a relation uniquely identifies/determines
another attribute.
Transitive functional dependency occurs when there is an indirect relationship that causes
functional dependency. E.g like saying that since A is related to B and B is related to C,
therefore A is related to C.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
EMPLOYEE table:
29
30 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
EMP_ID EMP_NAME EMP_ZIP
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
30
31 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
SQL STATEMENTS
SQL, Structured Query Language, is a programming language designed to manage data stored
in relational databases. SQL operates through simple, declarative statements. This keeps data
accurate and secure, and helps maintain the integrity of databases, regardless of size.
DDL (Data Definition Language). These SQL statements define the structure of a database,
including rows, columns, tables, indexes, and database specifics such as file locations.
DML (Data Manipulation Language). These SQL statements are used to retrieve and
manipulate data. This category encompasses commands such as DELETE, INSERT,
SELECT, and UPDATE.
DCL (Data Control Language). These SQL statements control the security and permissions
of the objects or parts of the database(s). DML SQL commands include the following:
i. CREATE: This command builds a new table and has a predefined syntax. The CREATE
statement can be used to create DATABASES AND TABLES.
Its syntax is: CREATE TABLE [table name] ([column definitions]) [table
parameters].
For example,
a. The command:
31
32 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
ii. USE: The USE command allows you to specify the database you wish to work with
within your DBMS.
For example, if we're currently working in the sales database and want to issue some
commands that will affect the employees database.
USE employees
iii. ALTER: An alter command modifies an existing database table. This command can add
up additional column, drop existing columns and even change the data type of
columns involved in a database table. An alter command syntax is ALTER object
type object name parameters.
For example:
iv. DROP: A drop command deletes a table, index or view. Drop statement syntax is DROP
object type object name.
For example:
32
33 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
For example:
a. SELECT: This command is used to retrieve rows from a table. The syntax is SELECT
[column name(s)] from [table name] where [conditions].
For example:
i. SELECT *
FROM personal_info
The command shown above retrieves all of the information contained within
the personal_info table. The asterisk is used as a wildcard in SQL. It literally
means "Select everything from the personal_info table."
Above command limits the attributes retrieved from the database. It retrieves a
list of the last names of all employees in the company from the table
personal_info, where salary is above ksh 50,000. The WHERE clause is used
to limit the records that are retrieved to those that meet the specified criteria.
b. UPDATE: This command modifies data of one or more records. An update command
syntax is UPDATE [table name] SET [column name = value] where [condition].
For example:
UPDATE personal_info
SET salary = salary + ksh5000
WHERE employee_id = 12345
The above command is used to update the information of employee no. 12345. It
increments his salary by ksh 5000.
c. INSERT: This command adds one or more records to a database table. The insert
command syntax is INSERT INTO [table name] [column(s)] VALUES [value(s)].
For example:
33
34 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
The above command adds a new employee to the personal_info table. The four values
corresponds to the table attributes in the order they were defined: first_name,
last_name, employee_id, and salary.
d. DELETE: This command removes one or more records from a table according to
specified conditions. Delete command syntax is DELETE FROM [table name] where
[condition].
For example:
DELETE FROM personal_info
WHERE employee_id = 12345
The above command deletes from our personal_info table, the employee with ID No.
12345.
34
35 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
DATABASE MAINTENANCE
A number of different systems can be used to build and maintain databases, with one popular
example being MYSQL. A database that is not maintained can become sluggish, and people
may start to experience problems when trying to access records.
i. Backing up the data so that if anything happens there will be another copy available. ii.
Checking for signs of corruption in the database.
iii. Looking for problem areas.
iv. Rebuilding indexes.
v. Removing duplicate records.
vi. Checking for any abnormalities in the database.
DATABASE SECURITY
It concerns the use of a broad range of information security controls to protect databases
against compromises of their confidentiality, integrity and availability. It is designed to
protect the data, the database applications or stored functions, the database systems, the
database servers and the associated network links.
Database security requirements arise from the need to protect data from:
∙ Accidental loss and corruption.
35
36 DIT 0201: DATABASE SYSTEMS/ HCT0104: DATABASE MANAGEMENT
SYSTEMS
36