0% found this document useful (0 votes)
115 views65 pages

Swe 124 Lect 01

Uploaded by

mengotjovis10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views65 pages

Swe 124 Lect 01

Uploaded by

mengotjovis10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

SWE124:

DATABASE AND MERISE I


COVENANT UNIVERSITY
INSTITUTE, BUEA
CAMEROON

NYANGA B. Y.,CUINS 2023/2024 1


“DATA IS THE FUTURE”

THE WORLD IS INCREASINGLY


DRIVEN BY DATA…

THIS CLASS TEACHES THE BASICS OF


HOW TO USE & MANAGE DATA.

NYANGA B. Y.,CUINS 2023/2024 2


What this course is (and is not)
• Discuss fundamentals of database design
– Flashback on data models.
– Normalization(1NF, 2NF, 3NF)
– Relational algebra
– Practicals on MS access and or Mysql
– Information system II(MERISE):
• from conceptual to logical level
• From logical to physical level
• Methodology and software tools

– Not how to be a DBA or how to tune a DBMS.

NYANGA B. Y.,CUINS 2023/2024 3


What is expected from you
• Attend lectures
– If you don‟t, it‟s at your own peril

• Be active and think critically


– Ask questions, post comments on forums

• Do programming and homework projects


– Start early and be honest

• Study for tests and exams

NYANGA B. Y.,CUINS 2023/2024 4


Course Structure
Week 1
Fundamental objectives of a database
 Less redundancy
 Consistency
 ACID Properties
 Multiuser and Concurrent Access
 Multiple views
 Confidentiality/integrity
Week 2-4
– Flashback on data models
 Entity-Relationship Model
 Relational Model
Normalization(1NF, 2NF, 3NF)
Relational algebra
 Relational Algebra
 Relational Calculus
Week 5
– Practicals on MS access and or Mysql

NYANGA B. Y.,CUINS 2023/2024 5


Course Structure
Week 6-10
Information system II(MERISE):
1. From conceptual to logical level
 Human – Computer Interface (HCI): Explain what Human computer
interaction is and why it is needed
 Ergonomic elements
 Data organization
 Conception of files or database
 Coding
 Control
 Process organization
 Determination of the nature of processing
2. From logical to physical level
 Programming, test
 Documentation
 Maintenance
3. Methodology and software tools
 General presentation of the different methodologies
 Detail analysis of at least one of the methodologies(MERISE, SADT
etc).
 Evaluation of the cost of the detail study and development
 Usage of software tools in conceiving and developing the software.
NYANGA B. Y.,CUINS 2023/2024 6
Overview of DATABASE AND
MERISE I

NYANGA B. Y.,CUINS 2023/2024 7


Overview of database design: Methodology
Design methodology is a structured approach that uses procedures,
techniques, tools, and documentation aids to support and facilitate the
process of design.
A design methodology consists of phases each containing a number of
steps, which guide the designer in the techniques appropriate at each stage
of the project.
A design methodology also helps the designer to plan, manage, control,
and evaluate database development projects.
Furthermore, it is a structured approach for analyzing and modeling a set of
requirements for a database in a standardized and organized manner. It is the
divided into three main phases:

• Conceptual database design


• Logical database design
• Physical database design

NYANGA B. Y.,CUINS 2023/2024 8


Conceptual database design
• It is the process of constructing a model of the data
used in an enterprise, independent of all physical
considerations.
• The conceptual database design phase begins with the
creation of a conceptual data model of the enterprise,
which is entirely independent of implementation details
such as the target DBMS, application programs,
programming languages, hardware platform,
performance issues, or any other physical
considerations.

NYANGA B. Y.,CUINS 2023/2024 9


Conceptual database design: data model
• Identify entity types
• Identify relationship types
• Identify and associate attributes with entity or relationship types
• Determine attribute domains
• Determine candidate, primary, and alternate key attributes
• Consider use of enhanced modeling concepts (optional step)
• Check model for redundancy
• Validate conceptual model against user transactions
• Review conceptual data model with user

NYANGA B. Y.,CUINS 2023/2024 10


Logical database design
It is the process of constructing a model of the data used
in an enterprise based on a specific data model, but
independent of a particular DBMS and other physical
considerations.
The logical database design phase maps the conceptual
model on to a logical model, which is influenced by the
data model for the target database.
The logical data model is a source of information for the
physical design phase, providing the physical database
designer with a vehicle for making tradeoffs that are very
important to the design of an efficient database.

NYANGA B. Y.,CUINS 2023/2024 11


Logical database design: logical
data model
• Derive relations for logical data model
• Validate relations using normalization
• Validate relations against user transactions
• Check integrity constraints
• Review logical data model with user
• Merge logical data models into global model (optional
step)
• Check for future growth

NYANGA B. Y.,CUINS 2023/2024 12


Physical database design
It is the process of producing a description of the
implementation of the database on secondary storage; it
describes the base relations, file organizations, and
indexes used to achieve efficient access to the data, and
any associated integrity constraints and security measures.
The physical database design phase allows the designer to
make decisions on how the database is to be
implemented. Therefore, physical design is tailored to a
specific DBMS.
There is feedback between physical and logical design,
because decisions taken during physical design for
improving performance may affect the logical data model.

NYANGA B. Y.,CUINS 2023/2024 13


Physical database design: data
model to DBMS
• Translate logical data model for target DBMS
– Design base relations
– Design representation of derived data
– Design general constraints
• Design file organizations and indexes
– Analyze transactions
– Choose file organizations
– Choose indexes
– Estimate disk space requirements

NYANGA B. Y.,CUINS 2023/2024 14


Physical database design: data
mode to DBMS
• Design user views
• Design security mechanisms
• Consider the introduction of controlled
redundancy
• Monitor and tune the operational system

NYANGA B. Y.,CUINS 2023/2024 15


Critical guideline
• Work interactively with the users as much as possible.
• Follow a structured methodology throughout the data modeling
process.
• Employ a data-driven approach.
• Incorporate structural and integrity considerations into the data
models.
• Combine conceptualization, normalization, and transaction
validation techniques into the data modeling methodology.
• Use diagrams to represent as much of the data models as possible.
• Use a Database Design Language (DBDL) to represent additional
data semantics that cannot easily be represented in a diagram.
• Build a data dictionary to supplement the data model diagrams and
the DBDL.
• Be willing to repeat steps.

NYANGA B. Y.,CUINS 2023/2024 16


Overview of the relational data
model

NYANGA B. Y.,CUINS 2023/2024 17


Overview of the relational
data model
1. Definition of DBMS

2. Data models & the relational data model

3. Schemas & data independence

NYANGA B. Y.,CUINS 2023/2024 18


What is a DBMS?
• A large, integrated collection of data
• Models a real-world enterprise
– Entities (e.g., Students, Courses)
– Relationships (e.g., Alice is enrolled in 145)

A Database Management System


(DBMS) is a piece of software designed to
store and manage databases. It is a
SYSTEM

NYANGA B. Y.,CUINS 2023/2024 19


Overview of the relational
data model
• Consider building a course management system (CMS):

– Students
– Courses Entities
– Professors

– Who takes what


– Who teaches what Relationships

NYANGA B. Y.,CUINS 2023/2024 20


Data models
• A data model is a collection of concepts for describing data

– The relational model of data is the most widely used model


today
• Main Concept: the relation- essentially, a table

• A schema is a description of a particular collection of data, using


the given data model

– E.g. every relation in a relational data model has a schema


describing types, etc.

NYANGA B. Y.,CUINS 2023/2024 21


Modeling the CMS (Course Management System)

• Logical Schema
– Students(sid: string, name: string, gpa: float)
– Courses(cid: string, cname: string, credits: int)
– Enrolled(sid: string, cid: string, grade: string)

sid Name Gpa cid cname credi


101 Bob 3.2 Relations ts
123 Mary 3.8 564 564-2 4
Students sid cid Grade 308 417 2
123 564 A Courses
Enrolled
NYANGA B. Y.,CUINS 2023/2024 22
Other Schemata…
• Physical Schema: describes data layout
– Relations as unordered files Administrators
– Some data in sorted order (index)

• Logical Schema: Previous slide

• External Schema: (Views)


– Course_info(cid: string, enrollment: integer) Applications
– Derived from other tables

NYANGA B. Y.,CUINS 2023/2024 23


Data independence
Concept: Applications do not need to
worry about how the data is structured and
stored data
Logical I.e. should not need to ask: can
we add a new entity or attribute
independence: protection without rewriting the application?
from changes in the logical
structure of the data

Physical data I.e. should not need to ask:


which disks are the data stored
independence: protection on? Is the data indexed?
from physical layout changes

One of the most important reasons to use a DBMS

NYANGA B. Y.,CUINS 2023/2024 24


OVERVIEW OF DBMS TOPICS
KEY CONCEPTS & CHALLENGES

NYANGA B. Y.,CUINS 2023/2024 25


Overview of DBMS
topics
1. Transactions

2. Concurrency & locking

3. Atomicity & logging

NYANGA B. Y.,CUINS 2023/2024 26


Challenges with Many Users
• Suppose that our CMS application serves 1000‟s of
users or more- what are or will be some challenges?
• Security: Different We won’t look at too much in this
users, different roles course, but is extremely important

Disk access is slow, DBMS


• Performance: Need to hide the latency by doing
provide concurrent more CPU work
access concurrently

• Consistency: DBMS allows user to write


Concurrency can lead programs
to update problems as if they were the only user

NYANGA B. Y.,CUINS 2023/2024 27


Transactions
• A transaction is a sequential group of database manipulation
operations, which is performed as if it were one single work unit.
– In other words, a transaction will never be complete unless each individual
operation within the group is successful. If any operation within the
transaction fails, the entire transaction will fail.
• Atomicity: ensures that all operations within the work unit are
completed successfully; otherwise, the transaction is aborted at the
point of failure and previous operations are rolled back to their former
state.
• Consistency: ensures that the database properly changes states
upon a successfully committed transaction.
• Isolation: enables transactions to operate independently on and
transparent to each other.
• Durability: ensures that the result or effect of a committed
transaction persists in case of a system failure.

NYANGA B. Y.,CUINS 2023/2024 28


Transactions
• In MySQL DBMS for example, transactions begin with the statement BEGIN
WORK and end with either a COMMIT or a ROLLBACK statement. The SQL
commands between the beginning and ending statements form the bulk of
the transaction.
• The two keywords Commit and Rollback are mainly used for Transactions.
– When a successful transaction is completed, the COMMIT command
should be issued so that the changes to all involved tables will take
effect.
– If a failure occurs, a ROLLBACK command should be issued to return
every table referenced in the transaction to its previous state.
• You can control the behavior of a transaction by setting session variable
called AUTOCOMMIT.

NYANGA B. Y.,CUINS 2023/2024 29


Transactions
• A key concept is the transaction (TXN): an Atomicity: An
atomic sequence of database actions action either
(reads/writes) completes entirely
– If a user cancels a TXN, it should be as if or not at all
nothing happened!

• Transactions leave the DB in a consistent state


– Users may write integrity constraints, e.g.,
„each course is assigned to exactly one room‟ Consistency:
An action results in
However, note that the DBMS does a state which
not understand the real meaning of conforms to all
the constraints– consistency burden integrity
is still on the user! constraints

NYANGA B. Y.,CUINS 2023/2024 30


Scheduling Concurrent
Transactions
• The DBMS ensures that the execution of A set of TXNs is
isolated if their effect
{T1,…,Tn} is equivalent to some serial is as if all were
execution executed serially
• One way to accomplish this: Locking
– Before reading or writing, transaction requires a
lock from DBMS, holds until the end
• Key Idea: If Ti wants to write to an item x What if Ti and Tj need
and Tj wants to read x, then Ti, Tj X and Y, and Ti asks
for X before Tj, and Tj
conflict. Solution via locking: asks for Y before Ti?
• only one winner gets the lock -> Deadlock! One is
• loser is blocked (waits) until winner finishes aborted…

All concurrency issues handled by the DBMS…


NYANGA B. Y.,CUINS 2023/2024 31
Ensuring Atomicity & Durability
• DBMS ensures atomicity even if
a TXN crashes! Write-ahead
Logging (WAL):
• One way to accomplish this: Before any action
Write-ahead logging (WAL) is finalized, a
corresponding
log entry is forced
• Key Idea: Keep a log of all the to disk
We assume that the log
writes done. is on “stable” storage
– After a crash, the partially executed
TXNs are undone using the log

All atomicity issues also handled by


the DBMS…
NYANGA B. Y.,CUINS 2023/2024 32
A Well-Designed DBMS makes
many people happy!
• End users and DBMS vendors
– Reduces cost and makes money

• DB application programmers
– Can handle more users, faster, for cheaper, and with
better reliability / security guarantees!

• Database administrators (DBA)


– Easier time of designing logical/physical schema,
handling security/authorization, tuning, crash recovery,
and more…

NYANGA B. Y.,CUINS 2023/2024 33


Summary of DBMS
• DBMS are used to maintain, query, and manage large
datasets.
– Provide concurrency, recovery from crashes, quick
application development, integrity, and security

• Key abstractions give data independence

• DBMS R&D is one of the broadest, most exciting fields


in CSC.

NYANGA B. Y.,CUINS 2023/2024 34


Database Applications Examples
• Enterprise Information
– Sales: customers, products, purchases
– Accounting: payments, receipts, assets
– Human Resources: Information about employees, salaries,
payroll taxes.
• Manufacturing: management of production, inventory, orders,
supply chain.
• Banking and finance
– customer information, accounts, loans, and banking
transactions.
– Credit card transactions
– Finance: sales and purchases of financial instruments
(e.g., stocks and bonds; storing real-time market data
• Universities: registration, grades

NYANGA B. Y.,CUINS 2023/2024 36


Database Applications Examples (Cont.)

• Airlines: reservations, schedules


• Telecommunication: records of calls, texts, and data usage,
generating monthly bills, maintaining balances on prepaid
calling cards
• Web-based services
– Online retailers: order tracking, customized
recommendations
– Online advertisements
• Document databases
• Navigation systems: For maintaining the locations of varies
places of interest along with the exact routes of roads, train
systems, buses, etc.

NYANGA B. Y.,CUINS 2023/2024 37


Purpose of Database Systems
In the early days, database applications were built directly on top
of file systems, which leads to:
• Data redundancy and inconsistency: data is stored in
multiple file formats resulting induplication of information in
different files
• Difficulty in accessing data
– Need to write a new program to carry out each new task
• Data isolation
– Multiple files and formats
• Integrity problems
– Integrity constraints (e.g., account balance > 0) become
“buried” in program code rather than being stated
explicitly
– Hard to add new constraints or change existing ones

NYANGA B. Y.,CUINS 2023/2024 38


Purpose of Database Systems (Cont.)
• Atomicity of updates
– Failures may leave database in an inconsistent state with partial
updates carried out
– Example: Transfer of funds from one account to another should
either complete or not happen at all
• Concurrent access by multiple users
– Concurrent access needed for performance
– Uncontrolled concurrent accesses can lead to inconsistencies
• Ex: Two people reading a balance (say 100) and updating it by
withdrawing money (say 50 each) at the same time
• Security problems
– Hard to provide user access to some, but not all, data

Database systems offer solutions to all the


above problems
NYANGA B. Y.,CUINS 2023/2024 39
Data Models
• A collection of tools for describing
– Data
– Data relationships
– Data semantics
– Data constraints
• Relational model
• Entity-Relationship data model (mainly for database design)
• Object-based data models (Object-oriented and Object-
relational)
• Semi-structured data model (XML)
• Other older models:
– Network model
– Hierarchical model

NYANGA B. Y.,CUINS 2023/2024 42


Relational Model
• All the data is stored in various tables.
• Example of tabular data in the relational model

Columns

Rows

Ted Codd
Turing Award 1981

NYANGA B. Y.,CUINS 2023/2024 43


A Sample Relational Database

NYANGA B. Y.,CUINS 2023/2024 44


View of Data
An architecture for a database system

NYANGA B. Y.,CUINS 2023/2024 46


Instances and Schemas
• Similar to types and variables in programming languages
• Logical Schema – the overall logical structure of the
database
– Example: The database consists of information about a
set of customers and accounts in a bank and the
relationship between them
• Analogous to type information of a variable in a
program
• Physical schema – the overall physical structure of the
database
• Instance – the actual content of the database at a
particular point in time
– Analogous to the value of a variable
NYANGA B. Y.,CUINS 2023/2024 47
Physical Data Independence
• Physical Data Independence – the ability
to modify the physical schema without
changing the logical schema
– Applications depend on the logical
schema
– In general, the interfaces between the
various levels and components should be
well defined so that changes in some
parts do not seriously influence others.

NYANGA B. Y.,CUINS 2023/2024 48


Data Definition Language (DDL)
• Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
• DDL compiler generates a set of table templates stored in a
data dictionary
• Data dictionary contains metadata (i.e., data about data)
– Database schema
– Integrity constraints
• Primary key (ID uniquely identifies instructors)
– Authorization
• Who can access what

NYANGA B. Y.,CUINS 2023/2024 49


Data Manipulation Language (DML)
• Language for accessing and updating the data organized by the
appropriate data model
– DML also known as query language
• There are basically two types of data-manipulation language
– Procedural DML -- require a user to specify what data are
needed and how to get those data.
– Declarative DML -- require a user to specify what data are
needed without specifying how to get those data.
• Declarative DMLs are usually easier to learn and use than are
procedural DMLs.
• Declarative DMLs are also referred to as non-procedural DMLs
• The portion of a DML that involves information retrieval is called a
query language.

NYANGA B. Y.,CUINS 2023/2024 50


SQL Query Language
• SQL query language is nonprocedural. A query takes as input
several tables (possibly only one) and always returns a single
table.
• Example to find all instructors in Comp. Sci. dept
select name
from instructor
where dept_name = 'Comp. Sci.'
• SQL is NOT a Turing machine equivalent language
• To be able to compute complex functions SQL is usually
embedded in some higher-level language
• Application programs generally access databases through one of
– Language extensions to allow embedded SQL
– Application program interface (e.g., ODBC/JDBC) which allow
SQL queries to be sent to a database

NYANGA B. Y.,CUINS 2023/2024 51


Database Access from Application Program
• Non-procedural query languages such as SQL are
not as powerful as a universal Turing machine.
• SQL does not support actions such as input from
users, output to displays, or communication over the
network.
• Such computations and actions must be written in a
host language, such as C/C++, Java or Python,
with embedded SQL queries that access the data in
the database.
• Application programs -- are programs that are
used to interact with the database in this fashion.

NYANGA B. Y.,CUINS 2023/2024 52


Database Design
The process of designing the general structure of the database:
• Logical Design – Deciding on the database
schema. Database design requires that we find a
“good” collection of relation schemas.
– Business decision – What attributes should we
record in the database?
– Computer Science decision – What relation
schemas should we have and how should the
attributes be distributed among the various
relation schemas?
• Physical Design – Deciding on the physical layout
of the database

NYANGA B. Y.,CUINS 2023/2024 53


Database Engine
• A database system is partitioned into
modules that deal with each of the
responsibilities of the overall system.
• The functional components of a database
system can be divided into
– The storage manager,
– The query processor component,
– The transaction management component.

NYANGA B. Y.,CUINS 2023/2024 54


Storage Manager
• A program module that provides the interface between the
low-level data stored in the database and the application
programs and queries submitted to the system.
• The storage manager is responsible to the following tasks:
– Interaction with the OS file manager
– Efficient storing, retrieving and updating of data
• The storage manager components include:
– Authorization and integrity manager
– Transaction manager
– File manager
– Buffer manager

NYANGA B. Y.,CUINS 2023/2024 55


Storage Manager (Cont.)
• The storage manager implements several
data structures as part of the physical system
implementation:
– Data files -- store the database itself
– Data dictionary -- stores metadata about
the structure of the database, in particular
the schema of the database.
– Indices -- can provide fast access to data
items. A database index provides pointers
to those data items that hold a particular
value.
NYANGA B. Y.,CUINS 2023/2024 56
Query Processor
• The query processor components include:
– DDL interpreter -- interprets DDL statements and
records the definitions in the data dictionary.
– DML compiler -- translates DML statements in a
query language into an evaluation plan consisting
of low-level instructions that the query evaluation
engine understands.
• The DML compiler performs query optimization;
that is, it picks the lowest cost evaluation plan
from among the various alternatives.
– Query evaluation engine -- executes low-level
instructions generated by the DML compiler.

NYANGA B. Y.,CUINS 2023/2024 57


Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation

NYANGA B. Y.,CUINS 2023/2024 58


Transaction Management
• A transaction is a collection of operations that
performs a single logical function in a database
application
• Transaction-management component ensures
that the database remains in a consistent (correct)
state despite system failures (e.g., power failures
and operating system crashes) and transaction
failures.
• Concurrency-control manager controls the
interaction among the concurrent transactions, to
ensure the consistency of the database.

NYANGA B. Y.,CUINS 2023/2024 59


Database Architecture
• Centralized databases
– One to a few cores, shared memory
• Client-server,
– One server machine executes work on behalf of multiple
client machines.
• Parallel databases
– Many core shared memory
– Shared disk
– Shared nothing
• Distributed databases
– Geographical distribution
– Schema/data heterogeneity

NYANGA B. Y.,CUINS 2023/2024 60


Database Architecture (Centralized/Shared-Memory)

NYANGA B. Y.,CUINS 2023/2024 61


Database Applications
Database applications are usually partitioned into two or
three parts
• Two-tier architecture -- the application resides at the
client machine, where it invokes database system
functionality at the server machine
• Three-tier architecture -- the client machine acts as a
front end and does not contain any direct database
calls.
– The client end communicates with an application
server, usually through a forms interface.
– The application server in turn communicates with a
database system to access data.

NYANGA B. Y.,CUINS 2023/2024 62


Two-tier and three-tier architectures

NYANGA B. Y.,CUINS 2023/2024 63


Database Users

NYANGA B. Y.,CUINS 2023/2024 64


Database Administrator
A person who has central control over the system is called a
database administrator (DBA). Functions of a DBA include:
• Schema definition
• Storage structure and access-method definition
• Schema and physical-organization modification
• Granting of authorization for data access
• Routine maintenance
• Periodically backing up the database
• Ensuring that enough free disk space is available
for normal operations, and upgrading disk space
as required
• Monitoring jobs running on the database
NYANGA B. Y.,CUINS 2023/2024 65
History of Database Systems
• 1950s and early 1960s:
– Data processing using magnetic tapes for storage
• Tapes provided only sequential access
– Punched cards for input
• Late 1960s and 1970s:
– Hard disks allowed direct access to data
– Network and hierarchical data models in widespread use
– Ted Codd defines the relational data model
• Would win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley (Michael Stonebraker) begins Ingres
prototype
• Oracle releases first commercial relational database
– High-performance (for the era) transaction processing
NYANGA B. Y.,CUINS 2023/2024 66
History of Database Systems (Cont.)
• 1980s:
– Research relational prototypes evolve into
commercial systems
• SQL becomes industrial standard
– Parallel and distributed database systems
• Wisconsin, IBM, Teradata
– Object-oriented database systems
• 1990s:
– Large decision support and data-mining
applications
– Large multi-terabyte data warehouses
– Emergence of Web commerce

NYANGA B. Y.,CUINS 2023/2024 67


History of Database Systems (Cont.)
• 2000s
– Big data storage systems
• Google BigTable, Yahoo PNuts, Amazon,
• “NoSQL” systems.
– Big data analysis: beyond SQL
• Map reduce and friends
• 2010s
– SQL reloaded
• SQL front end to Map Reduce systems
• Massively parallel database systems
• Multi-core main-memory databases

NYANGA B. Y.,CUINS 2023/2024 68


QUESTIONS

NYANGA B. Y.,CUINS 2023/2024 69

You might also like