0% found this document useful (0 votes)
409 views

CSC 414 - Data Management Lecture Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
409 views

CSC 414 - Data Management Lecture Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

CSC 414 –Data

Management II
Course Lecturer: Assoc. Prof. L.P. Damuut
[email protected]
Assessment
• CA = 40%
• Exams (Theory) = 60%
Course Outline
Introduction to Data Management
Relational data model
 Relational query languages
Theory and conceptual data design and modeling for relational databases
Hierarchical and network data models
File organization
Query processing
Concurrency control
 roll back and recovery
 data integrity and consistency,
data independence and minimal redundancy.
Course Objectives
At the end of the course a student should be able to:
• Describe a relational data model
• Differentiate between primitive and abstract data types
• Define Data management and explain the basic data management
practices
• Differentiate between hierarchical and network data models
• Solve basic problems using relational, hierarchical and network data
models
• Design a database app using relational data model
Introduction To Data Management
Definition: Data can be defined as unprocessed information. It can be seen as raw
facts. Example include, date of birth, sex, CGPA, Hostel number, height, age, phone
number, state of origin, blood group, etc.
Data types:
Primitive data types (e.g., integer, float, Boolean, Long, Short, Char, etc)
Abstract data types (e.g., record, tuple and object)
Primitive Data operations:
Add (e.g., 2+5 =7)
Subtract (e.g., 2.5-8.5 = -6)
Divide (e.g., 25/5 = 5)
AND (e.g., True AND False =False
OR (e.g., 1 OR 0 = 1)
What is Data Management?
Definition: The practice of collecting, organizing, analyzing, protecting
and storing data in order to meet clearly defined objectives. In an
organization, this practice can also involve sharing the data with
authorized personnel within and outside in order to meet the
organization’s objectives.
The role of data management in an organization cannot be
overemphasized. This is because data management increases the
reliability of the data which serves as an important tool for taking critical
decisions. For example, the management of PLASU can rely of the total
number of admitted students in any academic year to invest in building
more hostel accommodation.
Data Management Practices/Pillars
A. Ownership
Who owns the data depends on ….
• Funding source
Government
 private company
 philanthropic organization
Research institution
Individuals
B. Data Collection
 Use appropriate and reliable methods
 Pay attention to detail
 Obtain authorizations
• Human / animal subjects, biohazards, copyrighted material, etc.
 Keep permanent record regardless of format
• What was done / observed / achieved
Note the following basic rules to observe in record keeping -
1. Enter data / evidence into numbered, bound notebook
• Date, order of data collection, results achieved
• No binders or files, changes without date / reason

2. Electronic data should be validated to assure it was recorded on a particular date and not
altered later
C. Data Protection
Data, as currency of research, represents investment so, it needs to be adequately
protected.
It may be needed later in order to:
• confirm findings
• be re-analyzed by other interested parties

Therefore, data should be stored in a safe place and backed up if on computers

Note that confidentiality agreements must be honored!


National security : Sensitive data should be protected as a national security consideration
Copy-righted data should be protected as well
Relational Data Model
• Definition: A relational data model involves the use of tables in order
to collect groups of related record data elements. In this model, each
data table includes a primary key or identifier which serves as a link to
other tables. In other words, the relational data model represents
data as tables.
Example of a relational data model:
MAT NUMBER NAME MOE LEVEL SEX
PLASU/2017/010 MUSA DADU UTME 400 MALE
PLASU/2017/012 MARY KUMA DE 400 FEMALE
PLASU/2017/012 SANI ABDUL UTME 400 MALE
PLASU/2017/013 JENNIFER THIMOTHY DE 300 MALE
The relational model consists of rows and columns. Each column lists an
attribute of the entity /record being represented. As shown in the table
above, attributes include MAT NUMBER, NAME, LEVEL and SEX
accordingly. Each row represents an entity about whom data is being
collected.
Relational Model Terminologies
1. Relation : corresponds to a table
2. Tuple : a row of such a table
3. Attribute : a column of such a table
4. Cardinality : number of tuples
5. Degree : number of attributes
6. Primary key : an attribute or attribute combination that uniquely
identify a tuple
Example
Given the following relation/table, find the degree and cardinality
MAT NUMBER SURNAME STATE OF ORIGIN AGE
PLASU/2017/FNS/002 MUSA KADUNA 22

PLASU/2017/FNS/040 IBRAHIM BAUCHI 25

PLASU/2017/FNS/020 PAM PLATEAU 26

PLASU/2017/FNS/010 DASKYES PLATEAU 18

Solution:
The degree = number of attributes = 4
The cardinality = number of tuples = 4
A relational model is concerned with:
1. Data structure: tables
2. Data integrity: primary key rules, foreign key rules
3. Data manipulation:(Relational Operators):
• Relational Algebra
• Relational Calculus
e.g. <Entity> Relationship <Entity>
Mark is _in Comp. Sc_Dept
Relations cont.
Definition : A relation on domains D1, D2, ..., Dn (not necessarily all distinct) consists of a
heading and a body.

• Domain: a set of scalar values with the same type. Note that each attribute in a
relation is of a specific domain (e.g., Name is String and Age is Integer)
E.g. CREATE DOMAIN S# CHAR(5) (SQL command to create a domain named S# of data
type CHAR size 5)
• Heading : a fixed set of attributes A1,....,An such that Aj is of domain Dj (j=1...n) .
• Body: a time-varying set of tuples

• Tuple: a set of attribute-value pairs, {A1:Vi1, A2:Vi2,..., An:Vin}, where I = 1...m


Properties of Relations
• There are no duplicate tuples
• Tuples are unordered
• Attributes are unordered
• All attribute values are atomic. i.e. There is only one value, not a list
of values at every row-and-column position within the table.
Keys in a Relation
• Candidate key: Let R be a relation with attributes A1, A2, ..., An. The set
of attributes K (Ai, Aj, ..., Am) of R is said to be a candidate key iff it
satisfies the following rules:
Uniqueness: No two tuples of R can have the same value for K.
Minimum: none of Ai, Aj, ... Ak can be discarded from K without destroying the
uniqueness property

• Primary key: one of the candidate keys


• Alternate keys: candidate keys which are not the primary key
Relational Algebra
The relational algebra consists of a collection of eight high-level operators that
operate on relations. They are outlined as follows:
1. Intersection (ꓥ)
2. Union (U)
3. Difference (-)
4. Cartesian Product / Times (x)
5. Restrict (δ) or Selection
6. Project (Π)
7. Join ( )
8. Divide (∕)
Hierarchical Data Model
In a Hierarchical data model, data is stored in the form of records and
organized into a tree-like structure or a parent-child structure where
one parent-node can have many child nodes connected using links. A
record is a collection of fields each containing one value.

In the hierarchical data model each child is required to only have one
parent while a parent node can have more than one child node. In
order to access data , the whole tree needs to be traversed starting
from the root node.
Example of Hierarchical Relations

Staff Number Name Department


PF/SS/PLASU/020 Kumshin audu Physics
PF/SS/PLASU/045 Musa Silas Comp.Sci
PF/SS/PLASU/088 Makirwe Sati Comp.Sci

Ser. Number Device Staff Number


PLASU/01252 Tablet PF/SS/PLASU/020
PLASU/01232 PC PF/SS/PLASU/020
PLASU/01299 Printer PF/SS/PLASU/045
The first table above represents the parent node in the hierarchy while
the second table represents the child part of the hierarchy. As can
easily be deducted from the relations, Tablet and PC (child nodes) with
their respective serial numbers are both assigned to the staff with id
number PF/SS/PLASU/020 (Parent node)
Advantage(s) and Disadvantage(s) of
Hierarchical Data model
One key advantage of the hierarchical data model is efficiency. This is
seen in cases where the database contains several 1:n relationships

The hierarchical model on the other hand suffers from complexity in


implementation, although to conceptualize and design it could be
relatively easy. Moreover, the tree-like organization of data requires to
to bottom sequential search which is time-consuming and equally
requires repetitive storage of data in multiple entities with increased
chances of redundancy
Network Data Model
Unlike the hierarchical data model where a child node can only be
linked to one parent node, the network model allows for a child node
to be linked to more than one parent node. In this data model, the
parents are called the owners while the child nodes are referred to as
the members accordingly.
The figure below depicts the network data model

B1 B2

C1 C2 C3 C4 C5

D1 D2
Advantages and Disadvantages of Network
Data Model
The major advantage of the Network data model is its support for many-to-many
relationships: This is an improvement over the hierarchical model that can only
support one to many relationships

On the other hand the disadvantages of the network model include:


1. Increase in complexity: The many-to-many relationship has further
complications than the hierarchical model
2. It has flexibility challenges: Not all relationships can be defined and handled in
the form of owners and members
3. Difficult to understand and modify: The complications of this modes makes it
more difficult to implement and modify
Comparison Between Hierarchical and
Network Data Models
Metric Hierarchical Data Model Network Data Model
Relationship b/w Records Parent-child relationship. Relationships between
Supports 1:n (one to records is expressed using
many) relationships pointers. Supports n:m
(many-to-many)
relationships
Consistency of Data Data inconsistency can No data inconsistency
occur during deletion and
update of data
Traversal Traversal is complex Traversal is easy
Structure Tree-like structure Graph-like structure
File Organization

File organization is concerned with how file records are mapped onto
disk blocks for storage and access. There are four methods used in
organizing file records namely; sequential, heap, hash and clustered
file organizations respectively.
a. Sequential File Organization:
Every file record contains a data field (attribute) to uniquely identify
that record. In sequential file organization, records are placed in the file
in sequential order based on a unique key field or search key. In this
scheme, when a record is deleted or updated, the memory blocks are
searched an then the record will be marked for deletion or update
Advantages And Disadvantages of
Sequential File Organization
Advantages:
Quick and efficient when dealing with large data
Files can be stored easily using less storage mechanisms
It has simple design
Useful method for reports creation and statistical calculations
Disadvantages:
Wastes time to access and update records sequentially
Takes a long time to sort records in the database
b. Heap File Organization:
When a file is created using Heap File Organization, the Operating
System allocates memory area to that file without any further
accounting details. File records can be placed anywhere in that memory
area. It is the responsibility of the software to manage the records.
Heap File does not support any ordering, sequencing, or indexing on its
own.
Advantages of Heap File organization:
It is suitable for bulk insertion of records
For smaller databases, its faster to retrieve and modify records than in
sequential file organization
Disadvantages of Heap File Organization:
This method of file organization is inefficient for large databases
It takes a long time to search for and modify records using this
method. This is because there is no sorting or ordering of the records,
so there is need to check all the records one after the other
When deleting a record from a data block, the space will not be
reused or freed automatically in this scheme. For the space to be
reused, the database administrator needs to manually free up the
space.
c. Hash File Organization:
Hash File Organization uses a hash function to map values of varying
sizes based on fields of the records to values of fixed sizes. The output of
the hash function (i.e., the hash values) determines the location of disk
block where the records are to be placed in memory.

Advantages:
The records are automatically sorted in memory based on the hash
keys
Its faster to retrieve records using this method than others
Records are stored independently so no read, update and deletion
anomalies in the database
Disadvantages of Hash file organization:
Can easily cause accidental deletion of records if attribute values are
not properly specified for the hash function.
Memory is not efficiently used in this scheme as memory is not
consecutively utilized
Where there are multiple hash columns, searching for a record based
on a single attribute may return accurate results
d. Clustered File Organization:
Clustered file organization is not considered good for large databases.
In this mechanism, related records from one or more relations are kept
in the same disk block, that is, the ordering of records is not based on
primary key or search key.
Advantages:
When there are many requests for connecting tables with the same
joining condition cluster file organization is preferable
Whenever there is 1:m mapping between tables, this method
produces the most efficient output
Disadvantages:
This method is inefficient for large databases
When the joining condition is updated, traversing the file takes a much
longer time
For a table with 1:1 mapping, this method becomes ineffective
Query Processing in DBMS
Query processing is concerned with activities carried out in order to
extract data from the database. In query processing three identifiable
steps are taken in order to fetch the data from the database. The three
steps are as follows:
1. Parsing and translation
2. Optimization and
3. Evaluation
Each of these steps will be explained in turn.
1. Parsing and Translation
AT this stage, the user query gets translated in high level language such
as SQL into expressions that can be used at the physical level of the file
system. The parser checks the syntax of the query, verifies the name of
the relation(s) in the database, the tuple(s) as well as the required
attribute(s). The figure below explains the steps in query processing

Parsing & Relational algebra


Query translation expression

Optimizer

Output Evaluation

Data statistics
Execution plan

Data
Example: Given the following relation stu_rec, fetch the record of
students whose age >24

Solution: Using SQL, the command should be:


SELECT * FROM stu_rec WHERE AGE>24
In order for the system to understand this query, te command needs to
be translated in the form of a relational algebra αAGE> 24(ΠAGE(stu_rec))
2.Optimization
The cost of query evaluation varies depending on the query under
consideration. Although the system is responsible for evaluating the
query, the user is expected to write their query efficiently.

In order to optimize a query, the optimizer is expected to estimate the


cost analysis of each operation. This is because the over all operation
cost depends on the memory allocation and execution costs
3. Evaluation
The query evaluation plan is also referred to as query execution plan
In order to fully evaluate a given query, it is incumbent on the system to
construct a query evaluation plan.
The query evaluation plan defines a sequence of primitive operations
used for the evaluation of the query.
The query evaluation plan is responsible for generating the output of
the query under consideration. It takes the query execution plan,
executes it and finally produces the output of the user query
Query Processing With -SQL
Definition: SQL stands for Structured Query Language
SQL allows the user to access and manipulate the database.
SQL became the standard of the American National Standards Institute (ANSI) in 1986 and International
Organisation for Standards (ISO) in 1987.
SQL can:
• Execute queries on a database
• Update records in a database
• Retrieve data from a database
• Insert records into a database
• Delete records from a database
• Create new databases
• Create new tables in a database
• Etc.
Common SQL Command Statements
CREATE DATABASE <db_name>:
This command creates a new database called db_name
CREATE TABLE <tbl_name>:
This command creates a new table/relation called tbl_name
ALTER TABLE <tbl_name>:
This command modifies a table/relation called tbl_name
DROP TABLE <tbl_name>:
This command deletes a table/relation called tbl_name
UPDATE
This command updates a database
INSERT INTO
Inserts new data into a database
DELETE
Deletes data from a database
SELECT
Extract data from a database
SQL Command Syntax
• SELECT column1, column2,. . . . . Columnk FROM <table_name>
Example:
SELECT * FROM <table_name>:
The command when executed, displays all the records/tuples in the specified table
(<table_name>) comprising all the columns in the table.
Sometimes the user may want only entries of columns that are without duplicates. In
that case the SELECT DISTINCT command is used
SELECT DISTINCT column1, column2,. . . . . Columnk FROM <table_name>:
Example: SELECT DISTINCT MAT_NO, DOB FROM stu_rec
This command when executed will display mat number and DOB of all the records in
the table named stu-rec without duplicates
Syntax of UPDATE statement:
UPDATE <table_name> SET column_name1 =new value1,
column_name1 =new value1, …[ WHERE conddition]

E.G
UPDATE stu_rec SET marital_Status =married WHERE mat_no=
PLASU/2018/FNS/0001

Syntax of DELETE statement:


DELETE FROM <table_name> [ WHERE conddition]
E.g.
DELETE FROM stu_rec WHERE mat_no =‘PLASU/2018/FNS/0001’

Syntax of INSERT INTO command:


INSERT INTO <table_name> { col1, col2,… coln} VALUES { val1, val2… valn}

E.G.:
INSERT INTO stu_rec { mat_no, name, DoB, } VALUES
{PLASU/2018/FNS/0001, Sani Pam, 12/12/2000}

You might also like