Foundations of Databases
Foundations of Databases
Advanced Database
Management Systems
SNO Devine
Department of ICT & Mathematics
Presbyterian University, Ghana
Foundations of Databases
Concepts, File system, Database Models
and Types
Outline
• What a database is, what it does, and why database
design is important
• How modern databases evolved from files and file
systems
• About flaws in file system data management
• What a DBMS is, what it does, and how it fits into the
database system
• Database terminologies and the Database Users
• Types of database systems and database models
3
Databases
• With rapid growth in computerization and digitalization,
accurate record keeping practices are needed to effectively
manage the daily assets of any organization.
• The amount of data generated and collected today is
growing exponentially.
• Managerial decision making rely on accurate data. Hence, in
order for organizations to benefit from their operational
data, the need to avoid redundancies, inconsistencies and
data loss is required.
• Cannot over emphasize the need for a proper database
infrastructure in any organization and its implications on the
organizations operations.
• First, it is better to understand some principles of databases
and what defines its design; this would lead us to know
what data and information are about.
4
DATA
• Data is the “New Gold”.
• It has become a "critical raw material for producing
digital products and services.“
• Data is the “New Oil”.
• Generally credited to mathematician Clive Humby:
• “Data is the new oil. Like oil, data is valuable, but if
unrefined it cannot really be used. It has to be changed
into gas, plastic, chemicals, etc. to create a valuable
entity that drives profitable activity. So, must data be
broken down, analyzed for it to have value.” (Humby,
2006).
5
DATA
• It is estimated that the
internet contains
massive amount of
data to the tune of 5
million terabytes
(Forbes, 2023).
• It is estimated that
about 402.74 million
terabytes of data are
created each day
(Statista, 2023).
• “Created” includes
data that is newly
generated, captured,
copied, or consumed
6
DATA: What is it?
• Data is raw fact.
• Data is a known facts that can be recorded
and have an implicit meaning.
• Data is any piece of fact that requires further
transformation in order to derive its true
value and meaning for decision making.
• Data is the foundation or basis for deriving
information.
7
Types of Data
• Data is presented in two main forms with each
having two categories
• Qualitative
• Ordinal
• Nominal
• Quantitative
• Discrete
• Continuous
8
Types of Data: QUALITATIVE
• Qualitative Data
• also termed Categorical
• is a type of data that cannot be measured or counted in
the form of numbers.
• are the types of data that are sorted by category, not by
number.
• Nominal Data
• is used to label variables without any order or
quantitative value.
• Colour of hair (Black, Brown, Blonde, etc.)
• Marital status (Single, Widowed, Married)
• Ordinal Data
• has a natural ordering where a number is present in
some kind of order by their position on the scale.
• When companies ask for feedback, experience, or satisfaction on a Likert scale of 1 to 10
• Letter grades in the exam (A, B, C, D, etc.) 9
Types of Data: QUANTITATIVE
• Quantitative Data
• also known as Numerical data.
• is a type of data that can be expressed in numerical values,
making it countable and including statistical data analysis.
• often used for statistical manipulation represented on a wide
variety of graphs and charts.
• Discrete Data
• discrete means distinct or separate
• contain the values that fall under integers or whole numbers.
• cannot be broken into decimal or fraction values.
• Total numbers of students present in a class
• Numbers of employees in a company
• Continuous Data
• are data are in the form of fractional numbers
• Height of a person
• Market share price/value 10
Data: Forms and Format
• Every organization (or even from the individual level)
may create, generate or use data which comes in
different types/forms.
• Data can come in various forms numeric and text, and
formats (Document CSV, XLS, JSON), Audio (MP3, ACC,
etc.), Image (.JPG) and Video (MP4, AVI)
• A collection of Data that comes from various sources in
large
• Volume – Size/Amount of data qualifying as big data),
• Velocity – Speed/Rate at which the data is created and how
fast it moves),
• Value – Relevance/Value the data provides),
• Variety – Forms/Diversity that exists in the types of data,
• Veracity – Quality and accuracy of the Data),
• is termed Big Data 11
Essence of Data
• Every organization needs (quality) data to survive,
innovate and grow.
• Data is necessary for
• deriving insights in operations of an organization.
• obtaining a clear picture about the past and present.
• projecting or predicting future events or occurrences.
• surety of evidence and avoidance of guess work in decision
making.
• targeted growth and development in a specific or integrated
view.
• identification of problems and designing viable solutions.
• advocacy as a key element of fact to backup an argument
• strategic alignment and increase in efficiency.
• assess resource allocation, and drive improvement and
transformation.
• effective monitoring and addressing challenges promptly.
12
Introducing the Database
• Data versus Information
• Data: raw facts
• Being stored and retrieved
• Not been processed to reveal their meaning to the user
• For example:
• Robcor company has two divisions and the two division has
1,380,456 and 1,453,907 invoices, respectively.
• Each invoice has invoice number, date, and amount
• The period is from the first quarter of 1997 to first quarter of
2002.
• Total 2,834,363 records
13
Invoice Nbr Invoice Date Sales Amount
… … …
… … …
14
15
Introducing the Database
• Data versus Information
• Data constitute building blocks of information
• Information produced by processing data
• Information reveals meaning of data
• Good, timely, relevant information is key to good
decision making
• Good decision making key to organizational survival
16
Introducing the Database
• Qualities/Characteristics of Information
• Accurate
• Complete
• Consistent
• Concise/Summarized
• Relevant
• Reliable
• Scoped
• Time-bound
• Etc.
17
Database Management
• Data Management
• is the set of practices or act
• that focuses on the proper generation,
collection, organization, storage, protection,
retrieval, transformation and distribution of
data
• usually by an organization
• towards analysis for business decisions
18
Database
• Database
• is an organised collection of interrelated data
• is a group of related record stored for a specific
purpose
• Database can be
• manually or electronically managed
• presented as a shared, integrated computer
structure housing related data:
• End user data (raw data)
• Metadata (data about data, it contains data
characteristics and relationships)
19
Database Management
• Database Management System (DBMS):
software system (collection of software)
help to manage the data contents
• Manages Database structure
• Controls access to data
• Contains query language
Application software DBMS Database
21
DBMS Manages Interaction
22
DBMS Manages Interaction
23
Historical Roots of Database: Files and File
Systems
• Traditionally, file system compose of a collection of
files folders each being properly tagged and kept in
filing cabinets.
• First applications focused on clerical tasks
• Requests for information quickly followed
• File systems developed to address needs
• Data organized according to expected use
• Data Processing (DP) specialists computerized manual
file systems
24
File Terminology
• Data
• Raw Facts
• Field
• Group of characters with specific meaning
• Record
• Logically connected fields that describe a
person, place, or thing
• File and file folder
• Collection of related records
25
record field
26
File System Critique
• Sample COBOL Data Entry Interfaces/Screens
27
File System Critique
• File System Data Management
• Requires extensive programming in third-
generation language (3GL): COBOL, Basic, and
Fortran (what must be done and how it is to
be done)
• Time consuming
• Depends on physically storing data
• Makes ad hoc queries impossible
• Make difficult to modify file system (each file
has its own system)
• Leads to islands of information
28
File System Critique (cont’d.)
• Data Dependence
• Change in file’s data characteristics requires
modification of data access programs(e.g. changing
field from integer to decimal)
• Makes file systems cumbersome from programming
and data management views
• Structural Dependence
• Change in file structure requires modification of related
programs (e.g. adding or deleting a field)
29
File System Critique (cont’d.)
• Field Definitions and Naming Conventions
• Flexible record definition anticipates reporting
requirements
• Selection of proper field names important
• Attention to length of field names
• Use of unique record identifiers (record ids)
• Data Redundancy
• Different and conflicting versions of same data (data
pooling is difficult)
• Results of uncontrolled data redundancy
• Data anomalies (abnormalities)
• Modification
• Insertion
• Deletion
• Data inconsistency
• Lack of data integrity
30
Database Systems
• The DBMS software together with the data itself.
Sometimes, the applications are also included.
• Provides advantages over file system management
approach
• Eliminates data inconsistency (lack of data integrity),
data anomalies, data dependency, and structural
dependency problems
• Stores data structures, relationships, and access paths
31
Database vs. File Systems
32
Database System Environment
33
Database System Environment
• Hardware
• System’s Physical devices
• Computers
• Peripherals
• Network
34
Database System Environment
• Software
• Operating system: manages hardware
components
• DBMS: manages database
• MS Access, SQL Server, Oracle, DB2
• Application and utility software: support
access and manipulate data
• Generate information for decision making
• Help to manage database system
35
Database System Environment
• Database Users may be divided into
• Those who actually use and control the
database content, and those who design,
develop and maintain database
applications (called “Actors on the
Scene”), and
• Those who design and develop the DBMS
software and related tools, and the
computer systems operators (called
“Workers Behind the Scene”).
36
Database System Environment
• People (five users) – “Actors on the Scene”
• System administrator: hardware system
support
• Database administrator: manage DBMS use
• Database designer: design database structure
• System analysts and programmers:
implement application programs
• End users: employees and management
37
Database System Environment
• Procedures
• Instructions and rules that govern the design
and use of the database system
• Data
38
Database System Types
• Single-user vs. Multiuser Database
(user number)
• Desktop database
• Workgroup database
• Enterprise database
• Centralized vs. Distributed
(location)
• Use
• Production or transactional
• Decision support or data warehouse
(obtain information) 39
DBMS Functions
• Objective: Guarantee the integrity and
consistency of data. It has several functions:
• Data dictionary management: (the definition
of the data elements and their relationships
are stored in a data dictionary). It remove
data and structure dependencies.
• Data storage management: structures
required for data storage
• Data transformation and presentation:
relieving us from the distinct ion between
logical data format and physical data format
• Security management
40
DBMS Functions cont’d..
• Backup and recovery management
• Multiuser access control (concurrency)
• Data integrity management
• Database access language and application
programming interfaces
• Query language (DDL and DML)
• Database communication interfaces
41
Database Models
• Definition: collection of logical constructs
used to represent data structure and
relationships within the database
• Conceptual models: logical nature of data
representation; it emphasizes on what entity is
presented; it is used for database design as
blueprint
• Implementation models: emphasis on how the
data are represented in the database
42
Database Models
• Conceptual models include
• Entity-relationship database model (ERDBD)
• Object-oriented model (OODBM)
• Implementation models include
• Hierarchical database model (HDBM)
• Network database model (NDBM)
• Relational database model (RDBM)
• Object-oriented database model (ODBM)
43
Database Models (cont’d.)
• Relationships in Conceptual Models
• One-to-one (1:1)
• One-to-many (1:M)
• Many-to-many (M:N)
• Implementation Database Models
• Hierarchical
• Network
• Relational
• Object-Oriented
44
Hierarchical Database Model
• Logically represented by an upside down tree
• Each parent can have many children (segment linkage)
• Each child has only one parent
• Logically represented by an upside down tree
• 1:M relationship
45
Hierarchical Database Model
• Hierarchical path (beginning from left on disk)
• Left-list hierarchical path, or preorder traversal, or
hierarchical sequence
46
Hierarchical Database Model
• Advantages
• Conceptual simplicity: relationship between layers is
logically simple; design process is simple
• Database security: enforced uniformly through the
system
• Data integrity
• Data independence: automatic cascading of data type
changes in database
• Efficiency in 1:M relationships and when uses require
large numbers of transactions
47
Hierarchical Database Model
• Disadvantages
• Complex implementation: physical data storage
characteristics; database design is complicated
• Difficult to manage
• Lack of standards
• Lacks structural independence: navigational system
• Applications programming and use complexity (pointer
based)
• Implementation limitations, i.e. especially it only
handles 1:M type of model
48
Network Database Model (NDBM)
• Each record can have multiple parents
• Called by Database Task Group (DBTG) to define
standards
• Three crucial database components
• Network schema: conceptual organization of the entire
database
• Database name, record type and components for record
• Subschema: portion of database as information for
application programs
• Database management language: defining data
characteristics and data structure
• Schema Data definition language (DDL): define schema
components
• Subschema Data definition language
• Data manipulating language: manipulate data content
49
Network Database Model
• Each record can have multiple parents
• Introduce set to describe relationship
• Each set has owner record and member record,
parallel to parent and child in HDM
• Member may have several owners
• One-ownership
50
Network Database Model
• Advantages
• Conceptual simplicity, just like HDM
• Handles more relationship types (but all 1:M
relationship)
• Data access flexibility
• Promotes database integrity
• Data independence
• Conformance to standards
• Disadvantages
• System complexity
• Lack of structural independence
51
Relational Database Model (RDBM)
52
Relational Database Model
53
Relational Database Model
• Advantages
• Structural independence: data access path is is
irrelevant to database design; change structure will
not affect the database
• Improved conceptual simplicity
• Easier database design, implementation,
management, and use
• Ad hoc query capability with SQL (4GL is added)
• Powerful database management system
• Disadvantages
• Substantial hardware and system software overhead
• Poor design and implementation is made easy
• May promote “islands of information” problems
54
Entity Relationship Database Model
(ERDBM)
• Complements the relational data model concepts
• ERDBM introduces a relational graphic representation
• ERDBM is based on several components
• Entity, tabled entity (in RDM)
• Entity and entity set, a collection of like entities
• Each entity has attributes to describe the entity, which
is similar to field in table
• Relationship and connection
• Represented in an entity relationship diagram (ERD):
Chen’s ERD model and Crow’s Foot ERD
• Based on entities, attributes, and relationships
55
Entity Relationship Database
Model connection
entity
relationship
56
57
Entity Relationship Database Model
• Advantages
• Exceptional conceptual simplicity
• Visual representation
• Effective communication tool
• Integrated with the relational database model
• Disadvantages
• Limited constraint representation
• Limited relationship representation (internal
relationship can not be depicted; multiple
relationships)
• No data manipulation language
• Loss of information content
58
Object-Oriented Database Model (OODBM)
60
Object-Oriented Database Model
61
OO Database Model
• Advantages
• Adds semantic content (gives data greater meaning)
• Visual presentation includes semantic content
• Database integrity
• Both structural and data independence
• Disadvantages
• Lack of OODM standards
• no generalized data manipulation language or access
method
• Complex navigational data access
• Steep learning curve
• challenging to design and implement properly
• High system overhead slows transactions
62
63
Types of Database Systems
• Many types of database systems exits developed
based on the database models discussed
• However, other types of databases exist that are
not based on these models.
• With the explosion of different types of data and data
needs, newer database systems have been developed
• Others exist base on the data architect
64
Types of Database Systems
• NoSQL
• means Non-SQL/Not Only SQL
• is a type of database that is used for storing a
wide range of data sets.
• is not a relational database as it stores data not
only in tabular form but in several different
ways.
• came into existence when the demand for
building modern applications increased.
• We can further divide a NoSQL database into the
following four types:
65
Types of Database Systems
• NoSQL has four types
• Key-value storage
• It is the simplest type of database storage where it
stores every single item as a key (or attribute name)
holding its value, together.
• Examples are Redis, Riak, Oracle NoSQL, and
Amazon SimpleDB.
• Document-oriented Database
• A type of database used to store data as JSON-like
document.
• It helps developers in storing data by using the
same document-model format as used in the
application code.
• Examples are MongoDB, CosmosDB, Amazon
DynamoDB, and Amazon DocumentDB 66
Types of Database Systems
• NoSQL has four types
• Graph Databases
• It is used for storing vast amounts of data in a
graph-like structure.
• Most commonly, social networking websites use
the graph database.
• Examples are Neo4j, OreintDB, ArangoDB, and
AllegroGraph
• Wide-column stores
• It is similar to the data represented in relational
databases.
• Here, data is stored in large columns together,
instead of storing in rows.
• Examples are Apache Cassandra, ScyllaDB, Apache
Hbase and Google BigTable. 67
Types of Database Systems
• Cloud Database
• is a type of database where data is stored in a virtual
environment and executes over the cloud
computing platform.
• a database built to run in a public or hybrid cloud
environment to help organize, store, and manage
data within an organization.
• provides users with various cloud computing
services (SaaS, PaaS, IaaS, etc.) for accessing the
database (DBaaS).
• There are numerous cloud platforms, but the best
options are:
• Amazon Web Services(AWS), Microsoft Azure, Kamatera,
PhonixNAP, ScienceSoft, Google Cloud SQL, etc. 68
Types of Database Systems
• Cloud Database
• SaaS – Software as a Service
• allows users to connect to and use cloud-based
applications like DBMS over the Internet.
• PaaS – Platform as a Service
• is a complete development and deployment
environment in the cloud, with resources for
developing apps from simple cloud-based apps to
sophisticated, enterprise applications.
• IaaS – Infrastructure as a Service
• is a cloud computing model that provides on-
demand access to computing resources such as
servers, storage, networking, and virtualization. 69
Types of Database Systems
• Characteristics of “Internet age” databases
• Flexible, efficient, and secure Internet access
• Easily used, developed, and supported
• Supports agility, scalability and reduced cost
• Supports complex data types and relationships
• Seamless interfaces with multiple data sources and
structures
• Simplicity of conceptual database model
• Many database design, implementation, and
application development tools
• Powerful DBMS GUI make DBA job easier
70