0% found this document useful (0 votes)
56 views73 pages

Week 2 Complete

The document provides an overview of the historical evolution of databases from file systems to modern database management systems (DBMS). It discusses the progression from early file systems on tape in the 1950s to relational databases in the 1970s and beyond. It also describes the key advantages of using a DBMS over traditional file systems, such as reduced data redundancy, improved data integrity and security, and concurrency control.

Uploaded by

Rehman Aziz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views73 pages

Week 2 Complete

The document provides an overview of the historical evolution of databases from file systems to modern database management systems (DBMS). It discusses the progression from early file systems on tape in the 1950s to relational databases in the 1970s and beyond. It also describes the key advantages of using a DBMS over traditional file systems, such as reduced data redundancy, improved data integrity and security, and concurrency control.

Uploaded by

Rehman Aziz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 73

INTRODUCTION TO DATABASES

Part 1: Historical Evolution File System Vs Database


approach & Business process modeling
Part 2: Data Modeling, Data Abstraction and Database
Representation
Historical perspective
Evolution
History
3

 The first DBMS was designed by Bachman at GE in early


1960s
 In 1970 Codd at IBM proposed a new data representation
framework called the relational data model.
 The SQL query for relational databases, developed as part of
IBM’s System R project, was standardized in the late 1980s.
 The current standard, SQL-92, was adopted by ANSI
(American National Standards Institute) and ISO
(International Standards Organization).
History
4

 In late 1980s and 1990s, several vendors (e.g., IBM’s DB2,


Oracle 8) had extended their systems with the ability to store
new data types such as images and text.
 Specialized systems developed for data warehouses,
consolidating data from several databases.
 Entering the Internet Age, a new markup language XML was
proposed for data access through a Web browser.
 As more and more data are collected, companies are also
interested to mine useful information from their data giving
exquisite insight to the field of data mining
Generations

1950s (First Generation or File Systems


on Tape)
–batch processing, cards and tapes
(sequential processing)
1960s (Second generation or File
Systems on Disk)
-expanded use of random
- access disk technology
»database field
– early file systems
– generalized sorting packages
–beginnings of generalized software
systems
–data definition incorporated into
programming language
» COBOL
–development of in-house database system
Generations
 1970s (Third generation or Pre-Relational)
-movement towards standardization with CODASYL
»DBTG (Data Base Task Group)
-reports in 1969,1971 73, 78, 81, 85...,
-STORED data definitions AND data
-embed general access routines in a HOST language(COBOL)
-NETWORK and HIERARCHICAL SYSTEMS defined
-RELATIONAL model proposed by Codd (in theory)
-Computer Science Interest
-Clear separation between “logical” and “physical” organization
-Operational Issues examined in a more general and theoretical way
-First Relational prototype systems created (SYSTEM-R, INGRESS)
- Data Models become prevalent i.e. 3 level architecture
3rd generation

Logical & physical separation


4th Generation
 1980s (Fourth generation or Relational)
 »Relational Database Systems
-Powerful Languages and Interfaces
-Established Theory in Databases
-Set-oriented vs. Record-oriented management and
processing of data
-Database Systems integrated into large Transactional
Systems (networks, etc.)
-Appearance of object-oriented, “intelligent”, and other
models/system
5th Generation
 1990s (Fifth generation or Post-Relational)
 Emergence of COMPLEX OBJECTS in databases
(engineering objects, multimedia, software objects)
not only structured data
 Object-Relational Database Systems
 Multidatabases, Active and Extensible Systems,
 Massively Parallel -Multimedia Database Systems
 A strong showing of PC-based DBMSs.
 Web Database Systems -Servers
Evolved DB System
Evolved DBMS Architecture
DBMS uses
• Data independence and efficient access.
• Reduced application development time.

• Data integrity and security.

• Uniform data administration.

• Concurrent access, recovery from crashes


DBMS Features & Characteristics
 Self-Contained Nature of a Database System
 ADBMS CATALOG stores the DESCRIPTION of the database (called,
META-DATA). With that, the DBMS works on different databases
 Insulation between Programs and Data-
 Called PROGRAM-DATA independence. This feature allows to change
the data storage structures without having to change the DBMS access
programs
 Data Abstraction
 A Data Model is used to hide storage details and present the user with a
conceptual view of the database.
 Support of Multiple Views of the Data
 Each view describes only the data of interest to that use
Levels of Abstraction
 Many views, single
conceptual(logical)
schema
 and physical schema.
 Views describe how
users see the data.
 Conceptual schema
defines logical structure
 Physical schema
describes the files and
indexes used Schemas are defined using DDL;
data is modified/queried using DML
Example: University Databse
 Conceptual schema:
 Students(sid: string, name: string, login: string, age:
integer, gpa:real)
 Courses(cid: string, cname:string, credits:integer)
 Enrolled(sid:string, cid:string, grade:string)
 Physical schema:
 Relations stored as unordered files.
 Index on first column of Students.
 External Schema (View):
 Course_info(cid:string,enrollment:integer
Concurrency Control
 Concurrent execution of user programs is essential
for good DBMS performance.
 Because disk accesses are frequent, and relatively slow,
it is important to keep the CPU humming by working on
several user programs concurrently.
 Interleaving actions of different user programs can
lead to inconsistency: e.g., check is cleared while
account balance is being computed.
 DBMS ensures such problems don’t arise: users can
pretend they are using a single-user system
Transaction: An Execution of a DB
Program
 Key concept is transaction, which is an atomic sequence of
database actions (reads/writes).
 Each transaction, executed completely, must leave the DB in
a consistent state if DB is consistent when the transaction
begins.
 Users can specify some simple integrity constraints on the data,
and the DBMS will enforce these constraints.
 Beyond this, the DBMS does not really understand the semantics of
the data. (e.g., it does not understand how the interest on a bank
account is computed).
 Thus, ensuring that a transaction (run alone) preserves consistency is
ultimately the user’s responsibility
Scheduling Concurrent
transactions
 DBMS ensures that execution of {T1, ... , Tn} is equivalent
to some serial execution T1’ ... Tn’.
 Before reading/writing an object, a transaction requests a lock on
the object, and waits till the DBMS gives it the lock. All locks are
released at the end of the transaction.
 (Strict 2PL locking protocol.)
 Idea: If an action of Ti (say, writing X) affects Tj (which perhaps
reads X), one of them, say Ti, will obtain the lock on X first and Tj
is forced to wait until Ti completes; this effectively orders the
transactions.
 What if Tj already has a lock on Y and Ti later requests a lock on
Y? (Deadlock!) Ti or Tj is aborted and restarted!
Ensuring Atomicity
 DBMS ensures atomicity (all-or-nothing property)
even if system crashes in the middle of a Xact.
 Idea: Keep a log (history) of all actions carried out
by the DBMS while executing a set of Xacts:
 Before a change is made to the database, the
corresponding log entry is forced to a safe location.
 After a crash, the effects of partially executed
transactions are undone using the log.
The Log-Recovery
 The following actions are recorded in the log:
 Ti writes an object: the old value and the new value.
 Log record must go to disk beforethe changed page!
 Ti commits/aborts: a log record indicating this action.
 Log records chained together by Xact id, so it’s easy to
undo a specific Xact (e.g., to resolve a deadlock).
 Log is often duplexed and archived on “stable” storage.
 All log related activities (and in fact, all CC related
activities such as lock/unlock, dealing with deadlocks
etc.) are handled transparently by the DBMS
21 What is File System?
What is File System?
22

 A file system (often also written as filesystem) is a


method of storing and organizing computer files
and their data.
 Essentially, it organizes these files into a database
for the storage, organization, manipulation, and
retrieval by the computer's operating system.
 File systems are used on data storage devices such
as a hard disks or CD-ROMs to maintain the
physical location of the files.
System Architecture
23

Applications

Operating System (e.g. Windows, Linux)

File System (e.g. DOS, FAT)

Device Driver

Storage Media
(Hard Disk, Flash Memory)
Why do we need a DBMS?
24

 To reduce application development time

Suppose we are given a collection of raw files


which occupy 500GB

What are the constraints and


drawbacks? In case if we are
using file systems rather than
DBMS
25
Drawbacks of File System vs
DBMS
Drawbacks of File System vs DBMS
26

1. Data Redundancy and Inconsistency


 E.g., consider a bank application
 address of a customer in
 the file of “saving-accounts” and
 the file of “checking-accounts”

A good design of DBMS can


avoid
data redundancy and
inconsistency.
Drawbacks of File System vs DBMS
27

2. Difficulty in accessing data


 Need to write a new program to carry out each new
task
 Create
 Insert
 Retrieve It is easy to obtain data with
 Update DBMS
 Delete
Student Name ID Age Gender Entrance Year Grade

Sadaf Jamal A34455 20 F 1998 A

Ali Imtiaz C23444 19 M 1999 B

Wasif Ali C73334 19 M 2000 C


Drawbacks of File System vs DBMS
28

3. Integrity problems
 E.g., consider a bank application
 The balance cannot be below10000 PKR
 The day of a month cannot exceed 31

DBMS can check the


integrity
automatically
Drawback of File System
29

4. Atomicity of updates
 E.g., consider a bank application
 We want to transfer 1000 from account A to
account B
 Steps:
 Step 1: We deduct 1000 from account A
 Step 2: Then, we increment 1000 in account B
 If the system crashes at Step 1, then Step 2 cannot
be executed

DBMS makes sure that Step 1 and


Step 2
can be executed together even with a
crash (We call the execution is
Drawback of File System
30

5. Concurrent Access by multiple users


 Uncontrolled concurrent accesses can lead to inconsistencies
 E.g., consider a bank application
 There is an account shared by 2 customers A and B
 Customers A and B withdraw 1000 concurrently
A B

Read 5000
Read 5000
5000 - 1000
DBMS makes sure that the
5000 - 1000 concurrent
Write 4000 access cannot lead to this problem
Write 4000
Drawback of File System
31

6. Security Problems
 E.g., consider a bank application
 We do not want system programmers to have
permissions to read some data
(e.g., Customer A’s saving account and
Customer B’s saving account)
 Need a lot of effort to re-write a program for
this permission system

DBMS can enforce that different users


have
different permissions to access different
Advantages of DMBS
32

 With the use of DBMS, we have the following


advantages
 Data independence
 Efficient data access
 Data integrity and security
 Data administration
 Concurrent access and crash recovery
 Overall: Reduced application development time
Business Process Model (BPM)
BPM & DFD
Business Process Model (BPM)
 A Business Process Model (BPM) helps one identify,
describe, and decompose business processes.
 One can analyze the system at various levels of detail,
and focus alternatively on
 control flow (the sequence of execution)
or
 data flow (the exchange of data). You can use BPEL, BPMN,
and many other process languages.
 A business process diagram can be created in a
model, a package or within a decomposed process.
Sample
BPM
BPM Example
 In the previous example the path of an order depends on
whether it is a corporate order. The control flow passes
through the Process Corporate Order process, then through the
Check Book process, which checks the book availability in
the Inventory resource. The check is done through the
Inventory resource. Then the control flow path depends on
whether it is an overnight delivery. If yes, the control flow
passes through the Ship FedEx Overnight process with a
message format specifying the format of the information
exchanged (an administrative form for example). Then the
shipment is confirmed. In any cases the control flow will go to
Finish whether or not it is a corporate order.
Data Flow Model
 For data flow specifically Data Flow Diagrams (DFDs) in
business process modeling
 A Data Flow diagram (DFD) is a graphical description of the
flow of data in a given context.
 DFD allows one to identify the transformations that take place on data
as it moves from input to output in the system.
 Built at the beginning of the BPM to model the functions the system
has to carry out and the interaction between those functions together
with focusing on data exchanges between processes.
 One can associate data with conceptual, logical, and physical data
models and object-oriented models.
DFD sample
 Data Flow Diagram Sample
DFD Notations
Concept Gane & Sarson Yourdon Description

Process Location where data is


transformed.

Flow Oriented link between objects,


which conveys data.

Data store Repository of data.

External entity Source or destination of data.

Split/Merge Splits a flow into several flows or


merges flows from different
sources into one flow.
Order
OrderSystem
System

New customer

Order Picking
Order Slip
Customer Invoice Orde Warehouse
System r

Ship
Stateme
nt

Level 0 Context level diagram


Order
OrderSystem
System Level 1 Process level diagram

New 1
info D Customer
New customer 1 Master
Add
Customer
Packing
slip Warehouse
3
Warehouse
2
Picking
Cust Info Order
Order Process Slip
Customer Produced
Customer Pending Picked
Order Order
Backorder
Proc 4
Info
D Back D Inventory Ship
3 Order 2 Master Shipping statement Customer
Ship N/A Prepare

D Customer Shipping B/O


1 Master info
5
Cust Bill
Bill N/A Customer
Bill
Bill Info
Produce
Data Models
ERD & Relational Model
Data Models
43

 A data model is a collection of tools or concepts for


describing data, the meaning of data, data relationships
and data constraints.
 Two major types of data models are considered while
designing a database
 Object –based logical models
 Record based logical models
Data Models
44

 Object-based logical models


 Entity-Relationship Model (ER Model)

 Allows to pictorially denote entities and the relationships

among them.

id Customer-street

Customer-city
Customer-name
Loan-number amount

Customer Borrower Loan


Object Based Logical Model
 Sample Experiment
 Problem Statement: An Organization has a
database which maintains the records for the
customer which places the order(s) through the
salesmen. There can be more than one customer
who places the order(s) through a single salesman
or there can be single customer which places the
order(s) through multiple salesmen. Create the
database for this problem statement.
Object Based Logical Model
 Analyzing the Problem:
 According to above problem statement, name the relation
by identifying the noun and the associating relation
(Relationship set) by identifying the verb.
 Here, Customer sand Salesmen are noun so they are
considered as Entity and Order is verb so it is considered
as Relationship.
 After assigning the name of the relation, identify the
attributes for each of the relations.
 Also identify the primary key for each of the relation.
Then draw E-R diagram.
Object Based Logical Model
(ERD)
Data Models
48

 Record-based Logical Models


◦ Relational Model

◦ Main concept: relation is basically a table with rows and columns. A


column is also called a field or attribute.
◦ A description of data in terms of a data model is called a schema.
◦ Schema for a relation specifies its name, the name of each field (or
attribute or column), and the type of each field.
◦ Customer (Customer-Name: string, ID: integer, customer-street:
string, Customer-city: string).
Relational Model
The
Therelational
relationalmodel
modeluses
usesaacollection
collectionof
oftables
tablesto
torepresent
represent
both
bothdata
dataand
andrelationships
relationshipsamong
amongthose
thosedata
data
Attributes

customer- customer- customer- account-


Customer-id
name street city number

192-83-7465 Johnson
Alma Palo Alto A-101
019-28-3746 Smith
North Rye A-215
192-83-7465 Johnson
Alma Palo Alto A-201
321-12-3123 Jones
Main Harrison A-217
019-28-3746 Smith
North Rye A-201
49
Relation Model

50
51

What is Data Abstraction?


Data Independence

 One big problem in application development is the separation of


applications from data
 Do I have change my program when I …
 Replace my hard drive?
 Partition the data into two physical files (or merge two physical files into
one)?
 Store salary as floating point number instead of integer?
 ……

 The answer is to introduce levels of abstraction


52
Data Abstraction

Database
Database Systems
Systems provide
provide users
users with
with an
an abstract
abstract
view
view of
of data
data hiding
hiding certain
certain details
details of
of how
how data
data are
are
stored
stored and
and maintained
maintained

53
Data Abstraction
 Physical Level
 Describes how data is actually stored
 Logical Level
 Describes what data are stored in the database and what
relationships exist among those data
 View Level
 Describes only part of the entire database hiding details of
data types.
 Views can also hide information (e.g., salary) for security
purposes
54
Level of Abstraction

Payroll Inventory Sales

Company database

Files on disks
Instances and Schemas
 Similar to variable declarations and variable values in programming
languages

At instance t1
Student
Students;s;
struct
structstudent
student{{ s.name=
s.name=Florence;
Florence;
string
stringname;
name; s.age=25;
s.age=25;
int
intage;
age; s.gap=3.5;
s.gap=3.5;
float
floatgpa;
gpa;
};
}; Instances

At instance t2
s.name=
s.name=Jonathan;
Jonathan;
s.age=21;
s.age=21;
Schema s.gap=3.2;
s.gap=3.2;

56
Data Abstraction
57

 Conceptual Schema
 The conceptual schema (sometimes called the
logical schema) describes the stored data in terms of
the data model of the DBMS.
 In a relational DBMS, the conceptual schema
describes all relations that are stored in the database.
Data Abstraction (using schemas)
58

 Conceptual Schema
 University Database
 Students(sid: string, name: string, login: string, age: integer, gpa:
real)
 Faculty(fid: string, fname: string, sal: real)
 Courses(cid: string, cname: string, credits: integer)
 Rooms(rno: integer, address: string, capacity: integer)
 Enrolled
 Teaches
 Meets In
Data Abstraction
59

 Physical Schema
 The physical schema specifies additional storage
details.
 It summarizes how the relations described in the
conceptual schema are actually stored on secondary
storage devices such as disks.
 Decides file organizations to store the relations.
 Creates Indexes to speed up data retrieval.
 Decisions depend upon an understanding the data is
typically accessed.
Data Abstraction
60

 Physical Schema
 Indexes on University Database
 E.g. File formats, locations etc
Data Abstraction
61

 External Schema
 Allow data access to be customized (and authorized) at
the level of individual users or groups of users.
 Any given database has exactly one conceptual schema
and one physical schema because it has just one set of
stored relations, but it may have several external
schemas, each tailored to a particular group of users.
 Each external schema consists of a collection of one or
more views and relations from the conceptual schema.
Data Abstraction
62

 External Schema
 Find out the name of faculty member, room# and timings of
DBMS course.

 Views of a company with data on employees, departments,


products, …
 Payroll section: view on employees, departments, salaries
 Sales department: view on products, prices, sales, customers
 Purchasing department: view on parts, with pricing
Instances and Schemas

 Instance
 The actual content of the database at a particular point in
time
 Analogous to the value of a variable

63
Data Independence
 Ability to modify a schema definition in one level without affecting a
schema definition in the next higher level.

 The interfaces between the various levels and components should be well
defined so that changes in some parts do not seriously influence others.

 Two levels of data independence:


 Physical data independence
 Logical data independence

64
Data Independence

view 1 view 2 ..……... view n

Can
CanI Imake
makechanges
changeshere
herewithout
without Logical Logical
LogicalData
Data
affecting the application programs
affecting the application programs level Independence
Independence

Can
CanI Imake
makechanges
changeshereherewithout Physical Physical
without PhysicalData
Data
affecting the logical level
affecting the logical level level Independence
Independence
65
Physical Data Independence
 The ability to modify the physical schema without
changing the logical schema
 The logical schema is independent of the changes in
how the data is stored on disks, file structures etc.
Faculty: Faculty Name, Faculty ID, Faculty Office

66
Logical Data Independence
 Application programs are insulated from changes in
the structure/design of the database
Faculty: Faculty Name, Faculty ID, Faculty Office
Give me the office number
of Nicolas Lomenie
We changed the schema to:

Faculty_Public: Faculty Name, Faculty ID, Faculty Office

Faculty_Private: Faculty ID, Faculty Salary

Office # 247

67
Structure of a DBMS
68
69

People who deal with databases.


People who deal with databases
70

 Database Administrator (DBA): Person(s) who has


central control over the database and is responsible
for the following tasks:
 Schema definition/modification
 Storage structure definition/modification
 Authorization of data access
 Integrity constraint specification
 Monitoring performance
 Responding to changes in requirements
People who deal with databases
71

 Application Programmers
 Embed DML calls in program written in a host language (e.g., Cobol,
Java). (DML stands for data manipulation language)
 e.g., programs that generates payroll checks, transfer funds between
accounts
 Sophisticated users
 Post request in database query language (SQL interface)
 Naïve users
 Invokes one of the permanent application programs that have been
written previously (web forms, application front ends)
 e.g. transfer – transfer fund between accounts
Application Architecture

Two-tier architecture: E.g. client programs using ODBC/JDBC to


communicate with a database
Three-tier architecture: E.g. web-based applications, and
applications built using “middleware”
72
References
 Chapter 1, Database System Concepts, Silberschatz, Korth, Sudarshan
 Chapter 1, Database Management Systems, by Ramakrishnan and Gehrke
 Course material from:
 Introduction to database systems – Duke University
 Database Systems – MCS Fall 2009

73

You might also like