Draft EEI4366
Draft EEI4366
t
modelling first. Then we will discuss in Data modelling with Enhance ER diagrams in detail.
f
Finally we will revise how to draw E-R diagrams to depict a problem solution.
Content:
D r
1.1. Evolution of database applications
a
Session 1 - Database Management Systems & Data Modelling
Introduction:
Database management has evolved from a specialized computer application to a central
component of a modern computing environment. As a result, knowledge about database systems
has become an essential part of education in software engineering. In this session, we study
about the evolution of database applications, database architecture and data modeling.
t
1980s:
f
Research relational prototypes evolve into commercial systems
a
Parallel and distributed database systems
r
Object-oriented database systems
1990s:
D
Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
2000s:
XML and XQuery standards
Automated database administration
Activity 1.1
1. Define the following terms related to database systems.
data, database, database management system, data independence, user view.
2. What are the main characteristics of a database approach?
3. List down the important milestones with regard to evolution of database systems.
f t
A database schema is the structure or format of a database described in a formal language
a
supported by the database management system. Schemas are generally stored in a data
r
dictionary. Although a schema is defined in text database language, the term is often used to
refer to a graphical depiction of the database structure.
D
A database model is a specification describing how a database is structured and used. Following
are the most common database models.
● Hierarchical model
● Network model
● Relational model
● Object-relational model
2. Conceptual level
Describes the structure of the whole database for users. The conceptual schema hides
the details of physical storage structures and describe entities, data types etc..
3. External level
Describes the part of the database that a particular user group is interested in and hides the
rest of the database from that user group.
Classifications of DBMS
Four criteria to classify DBMS are:
● data model
● number of users
○ single user system - support for only one user at a time
○ multi user system - support for multiple users concurrently.
● number of sites
○ A centralized DBMS store data in a single computer.
○ A distributed DBMS have data on a database and software distributed over
many computers in a network.
● type of access path
t
1.3. Data modeling using the Entity Relationship (ER) model
f
The ER model is important primarily for its role in database design. It provides useful concepts
a
that allow us to move from an informal description of what users want from their database to a
r
more detailed and precise description that can be implemented in a DBMS.
D
Overview of Design and Methodology
There are two major methodologies used to create a data model: Entity-Relationship (ER)
approach and the Object Model. This session covers the Entity-Relationship approach. The ER
data model allows us to describe the data involved in a real-world enterprise in terms of objects
and their relationships and is widely used to develop an initial database design.
Attributes describe the entity of which they are associated. A particular instance of an attribute is
a value.
Relationships are classified in terms of degree, connectivity, cardinality, and existence. Not all
modelling methodologies use all these classifications.
The degree of a relationship is the number of entities associated with the relationship. The n-ary
relationship is the general form for degree n.
The connectivity of a relationship describes the mapping of associated entity instances in the
relationship. The values of connectivity are "one" or "many". The cardinality of a relationship is
the actual number of related occurrences for each of the two entities. The basic types of
connectivity for relations are; one-to-one, one-to-many, and many-to-many.
The direction of a relationship indicates the originating entity of a binary relationship. The entity
from which a relationship originates is the parent entity; the entity where the relationship
terminates is the child entity.
f t
An identifying relationship is one in which one of the child entities is also a dependent entity. A
a
non-identifying relationship is one in which both entities are Independent.
r
Existence denotes whether the existence of an entity instance is dependent upon the
D
existence of another, related, entity instance. The existence of an entity in a relationship is
defined as either mandatory or optional.
The relational model was formally introduced by Dr. E. F. Codd in 1970 and has evolved since
then, through a series of writings. The model provides a simple, yet rigorously defined, concept
of how users perceive data. The relational model represents data in the form of two-dimension
tables. Each table represents some real-world person, place, thing, or event about which
information is collected.
Here is an example of how these two concepts might be combined in an ER data model: Prof.
Silva(entity) teaches (relationship) the Database Systems course (entity).
t
An entity is an object in the real world with an independent existence that can be differentiated
f
from other objects. An entity might be
a
An object with physical existence (e.g., a lecturer, a student, a car)
r
An object with conceptual existence (e.g., a course, a job, a position)
D
Entities can be classified based on their strength. An entity is considered weak if its tables are
existence dependent. An entity set is a collection of entities of an entity type at a particular point
of time. In an entity relationship diagram (ERD), an entity type is represented by a name in a
box. The following section shows the different kinds of entities.
Kinds of Entities
You should also be familiar with different kinds of entities including independent entities,
dependent entities and characteristic entities. These are described below.
Independent entities
Independent entities, also referred to as kernels, are the backbone of the database. They are what
other tables are based on. Kernels have the following characteristics:
Dependent entities, also referred to as derived entities, depend on other tables for their
meaning. These entities have the following characteristics:
t
Create a new simple primary key
af
Characteristic entities
r
Characteristic entities provide more information about another table. These entities have the
following characteristics:
D
They represent multivalued attributes.
They describe other entities.
They typically have a one to many relationship.
The foreign key is used to further identify the characterized table.
Attributes
Each entity is described by a set of attributes (e.g., Employee = (Name, Address, Birthdate
(Age), Salary).
Each attribute has a name, and is associated with an entity and a domain of legal values.
However, the information about attribute domain is not presented on the ERD.
In the entity relationship diagram, each attribute is represented by an oval with a name inside.
Types of Attributes
There are a few types of attributes you need to be familiar with. Some of these are to be left as is,
but some need to be adjusted to facilitate representation in the relational model. This first section
will discuss the types of attributes. Later on we will discuss fixing the attributes to fit correctly
into the relational model.
Simple attributes
Simple attributes are those drawn from the atomic value domains; they are also called single-
valued attributes. In the COMPANY database, an example of this would be: Name = {Chandra}
; Age = {30}
Composite attributes
Composite attributes are those that consist of a hierarchy of attributes. Using our database
example, and shown in Figure 8.3, Address may consist of Number, Street and Suburb. So this
t
would be written as → Address = {59 + ‘Park Street’ + ‘Borella’}
f
Multivalued attributes
r a
Multivalued attributes are attributes that have a set of values for each entity. An example of a
multivalued attribute from the COMPANY database is the list of educational qualifications
D
earned by an employee: BSc, MBA, PhD.
Keys
An important constraint on an entity is the key. The key is an attribute or a group of attributes
whose values can be used to uniquely identify an individual entity in an entity set.
Types of Keys
Candidate key
A candidate key is a simple or a composite key which is unique and minimal. It is unique
because no two rows in a table may have the same value at any time. It is minimal because every
column is necessary in order to attain uniqueness.
From our COMPANY database example, if the entity is Employee(EID, First Name, Last
Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID), possible candidate keys are:
EID, SIN
First Name and Last Name – assuming there is no one else in the company with the same name
Last Name and DepartmentID – assuming two people with the same last name don’t work in the
same department
Composite key
Using the example from the candidate key section, possible composite keys are:
First Name and Last Name – assuming there is no one else in the company with the same name
Last Name and Department ID – assuming two people with the same last name don’t work in the
same department
Primary key
The primary key is a candidate key that is selected by the database designer to be used as an
t
identifying mechanism for the whole entity set. It must uniquely identify tuples in a table and not
f
be null. The primary key is indicated in the ER model by underlining the attribute.
a
A candidate key is selected by the designer to uniquely identify tuples in a table. It must not be
r
null.
A key is chosen by the database designer to be used as an identifying mechanism for the whole
D
entity set. This is referred to as the primary key. This key is indicated by underlining the
attribute in the ER model.
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary,
DepartmentID)
Foreign key
A foreign key (FK) is an attribute in a table that references the primary key in another table OR it
can be null. Both foreign and primary keys must be of the same data type.
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary,
DepartmentID)
Activity 1.2
1. What are the two concepts that ER modelling is based on?
2. Design an ER diagram for a University database specifying the attributes, entities and
structural contents.
Summary
In this session we discussed the meaning of the term ‘data model’ which is a collection of concepts
that can be used to describe the structure of a database. Moreover, we discussed about the ER
model for its role in database design. ER model is well suited to data modelling for use with
databases because it is fairly abstract and is easy to discuss and explain.
Learning outcomes:
After following this session, you will be able to,
● describe how database management systems evolved up-to-date
● explain the importance of database systems
t
● describe the database system architectures
f
● identify the structures in the different models of DBMS
●
a
design Entity Relationship (ER) model for a given problem
r
Essential Reading:
D
Chapter 1 and Chapter 9 of– Elmasri, R. & Navathe, S. (2016) Fundamentals of Database System,
7th edn. Pearson
Session 2 - Enhanced Entity - Relationship (EER) model
Content:
2.1 Specialization and Generalization
2.2 Constraints and characteristics of Specialization and Generalization
2.3 Modelling of UNION types
2.4 Data abstraction, knowledge representation
Introduction:
In this session, we discuss how ER model can be enhanced to add the specialization and
generalization concept with inheritance to obtain the EER model. The enhanced entity–
relationship model (EER modeling) introduces several concepts that are not in ER modeling, but
are closely related to object-oriented design, like is-a relationships. The entity hierarchy is as
t
follows:
af
2.1. Specialization and Generalization
r
One entity type might be a subtype of another--very similar to subclasses in OO programming:
D
Lion is a subtype of Mammal
A relationship exists between a Lion entity and the corresponding Mammal entity
Properties of IsA
Advantage: It is used to create a more concise and readable E-R diagram. It best maps to object
oriented approaches either to databases or related applications.
Specialization
t
Specialization is needed when an entity set has subsets that have additional attributes or when it
f
participates in special, separate relationships.
a
Process of breaking up a class into subclasses
r
Ex: Faculty contains Visiting lecturers (VistingLec) and fulltime lecturers (FulltimeLect)
D
All lecturers have attributes staffID, lastName, firstName, Designation.
VisitingLecturer also have hourlyRate
FulltimeLecturer have monthlySalary
Specialization can be total (every member of superclass must be in some subclass) or partial.
Generalization
Generalization is the process of combining entities based on common attributes or characteristics
of the entity. It is basically the inverse of the specialization.
For example: Student and Lecturers are both humans. It is also a bottom up process rather than a
top down process of specializations.
Specialization Hierarchy
Every subclass participated as a subclass in only one class/subclass relationship which results in a
tree structure or strict hierarchy.
Specialization Lattice is where a subclass can be a subclass in more than one class/subclass
relationship.
Multiple Inheritance is where the subclass is with more than one superclass and also if attribute
or relationship originating in the same superclass inherited more than once through different
paths in lattice. This also is included in only once share subclass.
Single Inheritance: It is where some of the models and languages limited to the single
inheritance.
t
specialization. This process also falls under the top-down conceptual refined process.
f
Bottom-up Conceptual Synthesis: This involves a rather generalization process than the
a
specialization of this process.
Activity 2.1
D r
1. Explain with examples the differences between Specialization and Generalization.
There are many design choices for the specialization and generalization in which each of it can
be defined to make the conceptual model accurate. For example, if the subclass has a few
specific attributes and no specific relationships, it can be merged in to the superclass. Also, if all
the subclass of a specialization/generalization have few specific attributed and no specific
relationships, this is too can be merged into the superclass and be replaced with one or more type
of attributed that specify the subclass or subclass that each entity belongs to. Though UNION
types help make accurate conceptual schema, the union types and categories should be generally
being avoided and the choice of disjoint/overlapping and total/partial constraints must be used on
either the specialization or generalization. These specifications could be extracted through rules.
Activity 2.2
1. State the purpose of using UNION with an appropriate example.
The following include formal definitions for the EER Model Concepts
Class: Set of collection of entities.
Subclass: In which entities must always be a subset of the entities in another class
Specialization: set of subclass that have same superclass.
Generalization: An entity or superclass that is generalized.
Predicate-defined: Predicate on the attributed of is used to specify which entitles is in one that
are members if another.
User-Defined: The subclass that is not defined by a predicate.
f t
2.4. Data abstraction, knowledge representation
It is basically a goal of the knowledge representation techniques which is accurately modelled in
a
some of the domain knowledge that creates an ontology that describes the concepts of the domain
r
and how these concepts are interrelated.
Also the goals are similar to those of the semantic data models which are also important similarities
D
and differences.
This includes the following:
Classification: this systematically assigns similar objects/entities to the object classes and the
entity types.
Instantiation is the inverse of classification and the generation and specific examination of distinct
objects of a class.
Exception objects is where they differ in some respects from other objects of class. The Knowledge
representation allows such class properties. One class can be an instance of another class that
cannot be represented directly in EER model.
Identification is also an abstraction process where the class and objects are made uniquely
identifiable by means of some identifier.
Summary
In this session we discussed the enhanced entity–relationship model (EER modeling) that
introduces several concepts not in ER modeling, but are closely related to object-oriented design,
like is-a relationship. Furthermore, we focused on the UNION type which basically is a type or a
category in which it represents a single superclass/subclass relationship with more than one
superclass
Learning outcomes:
After following this session, you will be able to,
● define basic building blocks of EER modelling
● explain the constraints in the EER modelling
● model the schemas using UNION
Essential Reading:
Chapter 3 and Chapter 5 of– Elmasri, R. & Navathe, S. (2016) Fundamentals Of Database System,
7th edn. Pearson
f t
Content:
3.1 Relational database design using EER to relational mapping
a
3.2 Mapping EER model to relations
D r
Introduction:
In this session, we discuss how modelling could be designed by ER and then use EER for
relational mapping. Later the mapping is continued with the EER model that constructs a
relation. Further, we will discuss the relational schema types which are not represented by having
two attributes A and B in which one a primary key and other a foreign key.
In a relational schema relationship, the types are represented by having two attributes A and B in
which one is a primary key and the other is foreign key.
t
Multiple relation using subclass relation only: This is where the subclasses are total and the
f
specialization has disjointedness constraint.
a
Single relation with one type attribute: The type or discriminating attribute indicated subclasss
r
of the tuple and the subclasses are disjoint as well. Moreover, there is a potential of generating
NULL values as well.
D
Single relation with multiple types: The subclasses could be overlapping and also it would
work for a disjoint specialization.
Activity 3.1
Using an example of your choice, convert relational database design using EER to relational
mapping.
Summary
In this session we discussed how the conceptual design in the ER is mapped to a relational
database schema in which the processes that are used for ER to relational mapping. Next we
discussed the additional steps in the process that maps the constructs from the EER model to the
relational model.
Learning outcomes:
After following this session, you will be able to,
● demonstrate the steps that is involves in converting the relation DB design using
relational mapping
● explain process for the EER model to constructs to relations
Essential Reading:
Chapter 3 of– Elmasri, R. & Navathe, S. (2016) Fundamentals Of Database System, 7th edn.
Pearson
t
Session 4 - Database Design tool - UML
f
Content:
a
4.1 Introduction to data modeling tool - UML
r
4.2 Specialization and Generalization in UML
D
Introduction:
The Unified Modeling Language (UML) is a specification language for object modeling. It is used
to create an abstract model of a complex system, (UML model) in forms that are easier to visualize
by human developers and users of that system.
In this session we will briefly discuss about UML as a data modeling tool and then focus on how
to use it for Specialization and Generalization.
t
The specialization or the generalization is a relationship that are both reciprocal and hierarchical.
f
They are reciprocal because the specialization which is the obvious or the significant part of the
a
generalization part. Therefore, the lion and tiger can specialize mammal, and mammal
r
generalizes from tiger and lion.
D
These relationships are hierarchical because they create a relationship tree where the
specialization types which are the branches are from the more generalized types. As you go up
the hierarchy you can see a greater generalization. In other words, when you go up the hierarchy
in this example, there would be mammal and when you move downwards you will find tiger and
lion.
Limitation of UML
It does not replace the other types of design documents. For example, the use case diagram
relates actors to use cases but they do not replace use cases. Moreover, many of the diagram
types are not widely used in practice as well. Furthermore, the UML diagramming tools are often
incompatible with each other.
Activity 4.1
Depict in a UML diagram the procedure of specifying the Specialization and Generalization
techniques.
Summary
In this session we discussed the use of UML tool and the techniques that are followed in order to
use it. Next we focused on how types of generalization and specification are applied using the
UML tool
Learning outcomes:
After following this session, you will be able to,
● identify the modelling techniques used in Specialization and Generalization
● model database design using Unified Modelling Language (UML)
Essential Reading:
Chapter 2 of– Elmasri, R. & Navathe, S. (2016) Fundamentals Of Database System, 7th edn.
Pearson
f t
Session 5 - Higher forms of Normalization
r a
Content:
5.1 Functional dependencies pg471
D
5.2 Second and third normal forms pg483
5.3 Boyce-codd normal form pg487
5.4 Multivalued dependency and fourth normal form pg491
5.5 Join dependencies and fifth normal form
Introduction:
Normalization should be part of the database design process. However, it is difficult to separate
the normalization process from the ER modelling process so that both techniques are used
concurrently.
It is convenient to use an entity relation diagram (ERD) to provide the big picture, or macro
view, of an organization’s data requirements and operations. This is created through an iterative
process that involves identifying relevant entities, their attributes and their relationships.
Normalization procedure focuses on characteristics of specific entities and represents the micro
view of entities within the ERD.
In the Level 3 course – EEI3266 we discussed normalization upto 3rd Form. In this session we will
briefly revise first three normal forms and then elaborate on higher forms of normalization such as
Boyce-codd normal form, multivalued dependency and fourth normal form. Finally we will
discuss how to join dependencies and fifth normal form.
Normalization is done based on functional dependencies between entities. Therefore, we will start
revising functional dependencies first in the next section before getting on to normalization.
X ———–> Y
The left side of the above FD diagram is called the determinant, and the right side is
the dependent. Here are a few examples.
t
Rules of Functional Dependencies
af
A B C D E
r
a1 b1 c1 d1 e1
a2 b1 c2 d2 e1
D
a3 b2 c1 d1 e1
a4 b2 c2 d2 e1
a5 b2 c3 d1 e1
Table 5.1: Functional dependency example, by A. Watt.
As you look at this table, ask yourself: What kind of dependencies can we observe among the
attributes in Table R? Since the values of A are unique (a1, a2, a3, etc.), it follows from the FD
definition that:
A → B, A → C, A → D, A → E
Since the values of E are always the same (all e1), it follows that:
A → E, B → E, C → E, D → E
However, we cannot generally summarize the above with ABCD → E because, in general, A →
E, B → E, AB → E.
Other observations:
a) Therefore, C → D
b) However, D values don’t determine C values
c) So C does not determine D, and D does not determine C.
Looking at actual data can help clarify which attributes are dependent and which are
determinants.
5.2 Normalization
Normalization is the branch of relational theory that provides design insights. It is the process of
t
determining how much redundancy exists in a table. The goals of normalization are to:
f
Be able to characterize the level of redundancy in a relational schema
a
Provide mechanisms for transforming schemas in order to remove redundancy
D r
Normalization theory draws heavily on the theory of functional dependencies. Normalization
theory defines six normal forms (NF). Each normal form involves a set of dependency properties
that a schema must satisfy and each normal form guarantees about the presence and/or absence
of update anomalies. This means that higher normal forms have less redundancy, and as a result,
fewer update problems.
Normal Forms
All the tables in any database can be in one of the normal forms we will discuss next. Ideally,
we only want minimal redundancy for PK to FK. Everything else should be derived from other
tables. There are six normal forms, but we will only look at the first four, which are:
To normalize a relation that contains a repeating group, remove the repeating group and form
two new relations.
The Primary Key (PK) of the new relation is a combination of the PK of the original relation plus
an attribute from the newly created relation for unique identification.
If the relation has a composite PK, then each non-key attribute must be fully dependent on the
entire PK and not on a subset of the PK (i.e., there must be no partial dependency).
In order to be in third normal form, the relation must be in second normal form. Furthermore, all
transitive dependencies must be removed; a non-key attribute may not be functionally dependent
on another non-key attribute.
t
5.3 Boyce-Codd Normal Form (BCNF)
f
When a table has more than one candidate key, anomalies may result even though the relation is
in 3NF. Boyce-Codd normal form is a special case of 3NF. A relation is in BCNF if, and only if,
a
every determinant is a candidate key. When a relation has more than one candidate key,
r
anomalies may result even though the relation is in 3NF. 3NF does not deal satisfactorily with
the case of a relation with overlapping candidate keys i.e. composite candidate keys with at least
one attribute in common. BCNF is based on the concept of a determinant. A determinant is any
D
attribute (simple or composite) on which some other attribute is fully functionally dependent. A
relation is in BCNF is, and only if, every determinant is a candidate key.
Activity 5.1
1. Explain what is BCNF
2. Take a scenario of your own choice and normalize it to BCNF
Example: ?
Once you have converted it to a MVD, there exists anomalies which will be eliminated by the
join dependencies and the 5NF. A JD is a notion that exists if the join of R1 and R2 over C is
equal to relation R. Where, R1 and R2 are the decompositions R1(A, B, C), and R2 (C,D) of a
given relations R (A, B, C, D). – Not clear
Example ?
f t
Activity 5.2
a
1. Explain the scenario in which the multivalued dependency is used.
Essential Reading:
D r
Chapter 8 of– Elmasri, R. & Navathe, S. (2016) Fundamentals Of Database System, 7th edn.
Pearson
Summary
In this session we discussed functional dependency and the techniques such as Normalization
which are used to remove data redundancy. We also discussed the higher forms of normalization
such BCNF, MV and JD to remove the anomalies that may have occurred in previous
normalization process.
Learning outcomes:
After following this session, you will be able to,
● describe the concept of functional dependency.
● explain the concepts of data redundancy.
● compare different types of data anomalies.
● demonstrate MV to an appropriate scenario
● apply JD to remove anomalies from MV
UNIT 02 – DATABASE FILE ORGANIZATION
f t
Content:
a
6.1 Secondary storage devices pg547
6.2 Buffering of blocks
r
6.3 Placing file records on disk pg560, pg564
6.4 Disk access using RAID technology
D
6.5 Modern storage architecture
Introduction:
Managing information means taking care of it so that it works for us and is useful for the tasks
we perform. By using a DBMS, the information we collect and add to its database is no longer
subject to accidental disorganization. It becomes a necessity to store them in a respective storage.
In this session we discuss various ways in which data can be stored.
There are different kinds of secondary storage devices. The following section compares the
different types of secondary storage devices that are available.
Magnetic Media
Fixed hard drive (HDD)
Data are stored on the surface of metal discs which have a magnetisable coating. Dots on the disc
can store different magnetic field values (1 or 0). The dots are arranged in circles and sectors
along the disc and can be read by the head, while the disc spins at high speeds.
Application: Main backing storage for almost all computers, as they offer random access and
relatively high access speeds.
Same function as fixed HDD, although they are smaller and come with electronics that allow the
drive to be accessed via USB or a similar connection.
Application: They allow the easy transportation of very large amounts of data from one
computer to another.
t
Typical Capacity: 500GB-2TB
f
Magnetic tape drive
a
Data are also stored on a magnetised film, however these are arranged along the length of a long
r
plastic strip. Data are accessed serially, accessing individual files is therefore slow.
D
Application: Tapes are used mainly for data back-ups where large amounts of data need to be
stored, but quick access to individual files is not required. Tapes are also used in some batch-
processing applications (e.g. to hold the list of data that will be processed).
Optical media
All of these work in a similar way, in which data are stored as a pattern of dots that can be read
by light, which is usually a laser beam. Data are read by bouncing the laser beam off the surface
of the medium. The different dot patterns can be read as the reflect the laser beam differently. All
of these allow random access but differ in their capacity and if they can be written.
CD-ROM/R/RW
ROM: read-only
R: recordable
RW: rewritable
Application: content distribution, e.g. music, software or e-books. Cheap writable storage.
Typical capacity: ~800 MB
DVD-ROM/R/RW/RAM
ROM: read-only
R: recordable
RW: rewritable
RAM: high-quality rewritable with high reliability
Blu-ray disc
They work in the same way as DVD-ROMs, but due to a shorter laser wavelength they have a
higher capacity.
t
HD-DVD
f
Meant as a competition to the Blu-ray disc, but less established.
r a
Application: Same as Blu-ray, but no longer in use.
D
Typical capacity: 15/30 GB
Flash memory is a non-volatile storage medium which can be electrically be erased and
reprogrammed and is based on EEPROM (electronically-erasable programmable read-only
memory). The most common type is NAND, where data can be read or written in blocks.
This data storage device uses flash memory which can be connected to a computer via USB.
Application: Used to transport small amounts of data quickly between different computers,
especially because they have a universal compatibility and are easy to transport(small).
Memory card
Memory cards use the same flash memory technology, but in form of a card.
Application: Used mostly in portable electronic devices as digital cameras, mobile phones,
laptops, tablets, etc.
Is based on the same flash memory technology, but its interface is the same as of a hard drive
(most commonly SATA).
Application: Used mainly as a main storage for desktops and laptops, sometimes in combination
with HDDs where larger capacities are needed. SSDs are best used for data that require fast
access times and read/write rates as for booting the operating system or starting important
programs.
When there are many blocks that need to be transferred from the disk to the main memory
provided that the addresses of all the blocks are known, there is a possibility to have several
t
buffers that can be reserved in the main memory to speed up the transfer.
af
When there is one buffer that is being used for the reading or writing, the CPU at that time can
r
process the data available in the other buffer since the independent disk (I/O) processor exists
when it gets started it can proceed to transfer a block of data between memory and disk
D
independent of and in parallel in CPU processing.
Activity 6.1
Explain how the buffering works and the mechanism involved in data storage.
A collection of the field names and their respective data types consists of a record type or record
format. The data type that is associated with each field specifies the type of values a field can
accept.
RAID is the abbreviated term for a Redundant Array of Independent Drives. RAID is now used
as an umbrella term for computer data storage schemes that can divide and replicate data among
multiple physical disk drives. The physical disks are said to be in a RAID array, which is
addressed by the operating system as one single disk. The different schemes or architectures are
named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each scheme provides
a different balance between two key goals: increase data reliability and increase input/output
performance.
af t
r
Activity 6.2
Describe in detail what RAID is and its purpose.
D
6.5 Modern storage architecture
In order to compete with the evolving volumes of data one must create and then manage a
database that we need to thoroughly understand about the data that it holds. This is due to the
reason as data can be produced in different levels and this in turn will create different levels.
Therefore, it is necessary to be up to date with upcoming research to cope up with this
mechanism.
Activity 6.3
Refer to one latest research paper and describe one modern storage architecture that is currently
being used or under development.
Summary
In this session we discussed different types of storages used in the DBMS and also the procedure
as to how the buffering works. We also discussed about RAID in the DBMS along with the recent
development in the modern storage.
Learning outcomes:
After following this session, you will be able to,
● Explain storage device characteristics
● Describe the purpose of RAID in file organization
Essential Reading:
Chapter 9 of– Elmasri, R. & Navathe, S. (2016) Fundamentals Of Database System, 7th
edn. Pearson
af t
r
Session 7 - File Organization for DBMS
D
Content:
7.1 Heap Files
7.2 Hashing techniques
7.3 Other primary file organizations
Introduction:
There are different types of file organization that helps organize data into blocks. They may be
basic but are crucial steps for organizing data for an efficient system. This includes a heap files
which is a basic type of the organization and it works with data blocks. There are few other
techniques available which take large range of values and map them onto a smaller set of values.
In this session we will discuss file organization methods such Heap and Hash in depth and
Sequential File Organization, B+ Tree File Organization and Clustered File Organization briefly.
Heap is the basic type of the organization and it works with data blocks. In the heap file
organization there are records that are inserted at the file’s end and when the records are inserted,
there is no need for the records to do any kind of sorting and ordering of the records. Although
the heap organization seems logical, sometimes a choice is made to use the primary key as
hashing material for a hash based organization of a file in blocks, and the use of an extendible
hashing or linear hashing instead of a memory built hash table, is required, because that is how
the records are organized, as entries in a block/ bucket of a on disc hash table.
Activity 7.1
Explain in detail as to how the heap file functions.
f t
They are hash values that can be used to speed data retrieval and that can be used to check the
a
validity of the data.
r
When one record is to be retrieved from many in a file, a searching process is required which
D
takes time that varies with the record available. Therefore, a hash key is generated for the
record’s key which could be used as the hash value as the address of the record that can be
moved directly to it. This would consume the same time regardless of the number of records in
the file.
Activity 7.2
Explain how hashing works and the purpose of having a hash function.
Summary
In this session we discussed that extensive volumes of data introduce many types of file
organization in order to provide an efficient access system. There are techniques such as heap
files which are the basic type of file organization techniques and commonly used hashing which
provides an efficient retrieval process. There are three other file organization methods which can
be used based on the application. These are Sequential File Organization, B+ Tree File
Organization and Clustered File Organization.
Learning outcomes:
After following this session, you will be able to,
● Describe how heap file system works.
● Explain the techniques involved in hashing
● List the types of file organization available
t
Essential Reading:
f
Chapter 9 of– Elmasri, R. & Navathe, S. (2016) Fundamentals of Database System, 7th edn.
Pearson
D r
Session 8 - Indexing structures
Content:
a
8.1 Types of Single Level Ordered Indexes
8.2 Multilevel Indexes
8.3 Dynamic Multilevel indexes using B+ Trees and B- Trees
Introduction: This session discusses about the ordered index that it is similar to that of the index
of a text book. An index structure for a file with a specific record structure that contains several
fields or attributes are usually defined on a single field of a file called an index field or index
attribute. Here we also study multilevel indexes that includes an index file that is a sorted file
with distinct value. We will briefly discuss dynamic multilevel indexes consists of bounds from
maximum and minimum on the number of keys in each node.
An index structure for a file with a specific record structure that contains several fields or
attributes are usually defined on a single field of a file called an index field or index attribute.
The index usually stores the respective value of the index field along with the list of pointers to
all the disk blocks that contains the records with that field value. Therefore, the values in the
index file that are ordered and also a binary search can be done on the index.
t
and the second field is which points to the disk block or a block address. There is also one such
f
index entry in the index file for each block in the data file where each index entry has the value
a
of the primary key field for the first record in a block and a pointer to that block as its two field
r
values.
D
Clustering Index: If the records are physically ordered in a non-key field, where it does not
have a distinct value for each record, the field is called the clustering field and the data file is
also called a clustered file. Different types of index called the clustered index can be created in
order to speed up the retrieval of all the records which have the same value for the clustering
fields. This is different from the primary index where ordering the fields of the data file to have
distinct value for each record is important.
The clustering index is basically an ordered file with two fields where the first field is the same
type as the clustering field of the data and the second field is a disk block pointer. There is only
one entry in the clustering index for every distinct value of the clustering fields and that also
consists of the value and the pointer to the first block in a data file that has a record with that
value for its clustering field.
Secondary Index
It is a basically a sorted file of records which is of either fixed length or variable length that
consists of two fields. The first field is where the same data type is referred as the indexing field
and where the non-ordering field and this is where the index is built also. The second field is
either a block pointer or record pointer. Moreover, there is a possibility for a file to have more
than one secondary index.
Activity 8.1
List and explain the types of single level ordered indexes using diagrams.
A multilevel index includes an index file that is a sorted file with distinct value for each K(i) that
is referred to as the first or base level of multilevel structure. It is possible to build a primary
index for an index files itself and because the second level is a primary index, it is possible to use
block anchors so that the second level has one entry for each block of the first level entries. With
t
an inclusion of few more steps, the process can be repeated and third level index can be created
f
on top of the second level one. Likewise, it is possible to do the index-building process until all
the entries of the index level fit a single block.
r a
The multilevel structure can be applied to any type of index such primary clustering secondary as
well.
D
8.3 Dynamic multilevel indexes using B+ Trees and B-
It is a dynamic, multilevel index which consists of bounds from maximum and minimum on the
number of keys in each node. B+ tree is a data structure that is of tree type and it is also a
variation of a B- tree. The difference between B+ tree and B tree is that all the data are saved in
the leaves itself. The internal nodes contain only keys and tree pointers where all the leaves can
be at the same lowest level. The leaf nodes which are linked together similar to a linked list is
made that way so that the queries would be made easy.
In a record that a maximum number of keys in a record is called the order of the B+ tree. The
minimum is defined as the keys per record is 1/2 of the maximum number of keys. The number
of keys that may be indexed using a B+ tree is a function of the order of the tree and its height.
Activity 8.2
Describe how the dynamic multilevel indexes using B+ Trees and B- functions with appropriate
examples.
Summary
Learning outcomes:
After following this session, you will be able to,
● Describe different types of indexes.
● Explain the purpose of the multilevel indexes
● Describe techniques of indexing through tree-structure
Essential Reading:
Chapter 9 of– Elmasri, R. & Navathe, S. (2016) Fundamentals Of Database System, 7th edn.
t
Pearson
r af
D