CSC 414 - Data Management Lecture Notes
CSC 414 - Data Management Lecture Notes
Management II
Course Lecturer: Assoc. Prof. L.P. Damuut
[email protected]
Assessment
• CA = 40%
• Exams (Theory) = 60%
Course Outline
Introduction to Data Management
Relational data model
Relational query languages
Theory and conceptual data design and modeling for relational databases
Hierarchical and network data models
File organization
Query processing
Concurrency control
roll back and recovery
data integrity and consistency,
data independence and minimal redundancy.
Course Objectives
At the end of the course a student should be able to:
• Describe a relational data model
• Differentiate between primitive and abstract data types
• Define Data management and explain the basic data management
practices
• Differentiate between hierarchical and network data models
• Solve basic problems using relational, hierarchical and network data
models
• Design a database app using relational data model
Introduction To Data Management
Definition: Data can be defined as unprocessed information. It can be seen as raw
facts. Example include, date of birth, sex, CGPA, Hostel number, height, age, phone
number, state of origin, blood group, etc.
Data types:
Primitive data types (e.g., integer, float, Boolean, Long, Short, Char, etc)
Abstract data types (e.g., record, tuple and object)
Primitive Data operations:
Add (e.g., 2+5 =7)
Subtract (e.g., 2.5-8.5 = -6)
Divide (e.g., 25/5 = 5)
AND (e.g., True AND False =False
OR (e.g., 1 OR 0 = 1)
What is Data Management?
Definition: The practice of collecting, organizing, analyzing, protecting
and storing data in order to meet clearly defined objectives. In an
organization, this practice can also involve sharing the data with
authorized personnel within and outside in order to meet the
organization’s objectives.
The role of data management in an organization cannot be
overemphasized. This is because data management increases the
reliability of the data which serves as an important tool for taking critical
decisions. For example, the management of PLASU can rely of the total
number of admitted students in any academic year to invest in building
more hostel accommodation.
Data Management Practices/Pillars
A. Ownership
Who owns the data depends on ….
• Funding source
Government
private company
philanthropic organization
Research institution
Individuals
B. Data Collection
Use appropriate and reliable methods
Pay attention to detail
Obtain authorizations
• Human / animal subjects, biohazards, copyrighted material, etc.
Keep permanent record regardless of format
• What was done / observed / achieved
Note the following basic rules to observe in record keeping -
1. Enter data / evidence into numbered, bound notebook
• Date, order of data collection, results achieved
• No binders or files, changes without date / reason
2. Electronic data should be validated to assure it was recorded on a particular date and not
altered later
C. Data Protection
Data, as currency of research, represents investment so, it needs to be adequately
protected.
It may be needed later in order to:
• confirm findings
• be re-analyzed by other interested parties
Solution:
The degree = number of attributes = 4
The cardinality = number of tuples = 4
A relational model is concerned with:
1. Data structure: tables
2. Data integrity: primary key rules, foreign key rules
3. Data manipulation:(Relational Operators):
• Relational Algebra
• Relational Calculus
e.g. <Entity> Relationship <Entity>
Mark is _in Comp. Sc_Dept
Relations cont.
Definition : A relation on domains D1, D2, ..., Dn (not necessarily all distinct) consists of a
heading and a body.
• Domain: a set of scalar values with the same type. Note that each attribute in a
relation is of a specific domain (e.g., Name is String and Age is Integer)
E.g. CREATE DOMAIN S# CHAR(5) (SQL command to create a domain named S# of data
type CHAR size 5)
• Heading : a fixed set of attributes A1,....,An such that Aj is of domain Dj (j=1...n) .
• Body: a time-varying set of tuples
In the hierarchical data model each child is required to only have one
parent while a parent node can have more than one child node. In
order to access data , the whole tree needs to be traversed starting
from the root node.
Example of Hierarchical Relations
B1 B2
C1 C2 C3 C4 C5
D1 D2
Advantages and Disadvantages of Network
Data Model
The major advantage of the Network data model is its support for many-to-many
relationships: This is an improvement over the hierarchical model that can only
support one to many relationships
File organization is concerned with how file records are mapped onto
disk blocks for storage and access. There are four methods used in
organizing file records namely; sequential, heap, hash and clustered
file organizations respectively.
a. Sequential File Organization:
Every file record contains a data field (attribute) to uniquely identify
that record. In sequential file organization, records are placed in the file
in sequential order based on a unique key field or search key. In this
scheme, when a record is deleted or updated, the memory blocks are
searched an then the record will be marked for deletion or update
Advantages And Disadvantages of
Sequential File Organization
Advantages:
Quick and efficient when dealing with large data
Files can be stored easily using less storage mechanisms
It has simple design
Useful method for reports creation and statistical calculations
Disadvantages:
Wastes time to access and update records sequentially
Takes a long time to sort records in the database
b. Heap File Organization:
When a file is created using Heap File Organization, the Operating
System allocates memory area to that file without any further
accounting details. File records can be placed anywhere in that memory
area. It is the responsibility of the software to manage the records.
Heap File does not support any ordering, sequencing, or indexing on its
own.
Advantages of Heap File organization:
It is suitable for bulk insertion of records
For smaller databases, its faster to retrieve and modify records than in
sequential file organization
Disadvantages of Heap File Organization:
This method of file organization is inefficient for large databases
It takes a long time to search for and modify records using this
method. This is because there is no sorting or ordering of the records,
so there is need to check all the records one after the other
When deleting a record from a data block, the space will not be
reused or freed automatically in this scheme. For the space to be
reused, the database administrator needs to manually free up the
space.
c. Hash File Organization:
Hash File Organization uses a hash function to map values of varying
sizes based on fields of the records to values of fixed sizes. The output of
the hash function (i.e., the hash values) determines the location of disk
block where the records are to be placed in memory.
Advantages:
The records are automatically sorted in memory based on the hash
keys
Its faster to retrieve records using this method than others
Records are stored independently so no read, update and deletion
anomalies in the database
Disadvantages of Hash file organization:
Can easily cause accidental deletion of records if attribute values are
not properly specified for the hash function.
Memory is not efficiently used in this scheme as memory is not
consecutively utilized
Where there are multiple hash columns, searching for a record based
on a single attribute may return accurate results
d. Clustered File Organization:
Clustered file organization is not considered good for large databases.
In this mechanism, related records from one or more relations are kept
in the same disk block, that is, the ordering of records is not based on
primary key or search key.
Advantages:
When there are many requests for connecting tables with the same
joining condition cluster file organization is preferable
Whenever there is 1:m mapping between tables, this method
produces the most efficient output
Disadvantages:
This method is inefficient for large databases
When the joining condition is updated, traversing the file takes a much
longer time
For a table with 1:1 mapping, this method becomes ineffective
Query Processing in DBMS
Query processing is concerned with activities carried out in order to
extract data from the database. In query processing three identifiable
steps are taken in order to fetch the data from the database. The three
steps are as follows:
1. Parsing and translation
2. Optimization and
3. Evaluation
Each of these steps will be explained in turn.
1. Parsing and Translation
AT this stage, the user query gets translated in high level language such
as SQL into expressions that can be used at the physical level of the file
system. The parser checks the syntax of the query, verifies the name of
the relation(s) in the database, the tuple(s) as well as the required
attribute(s). The figure below explains the steps in query processing
Optimizer
Output Evaluation
Data statistics
Execution plan
Data
Example: Given the following relation stu_rec, fetch the record of
students whose age >24
E.G
UPDATE stu_rec SET marital_Status =married WHERE mat_no=
PLASU/2018/FNS/0001
E.G.:
INSERT INTO stu_rec { mat_no, name, DoB, } VALUES
{PLASU/2018/FNS/0001, Sani Pam, 12/12/2000}