Chap 1
Chap 1
Chapter 1: Contents
Concepts & History of Database System
Drawbacks of Database in a Text File
Data Abstraction and Models
Database Language and Users
Overall System Architecture 2
Database Management Systems
• DBMS Provides
environment that is
convenient and efficient Data Data Data
to use for data retrieval
and storage Database
DBMS 4
History of Database Systems
• 1950s and early 1960s:
– Data processing using magnetic tapes for storage
• Tapes provide only sequential access
– Punched cards for input
• Late 1960s and 1970s:
– Hard disks allow direct access to data
– Network and hierarchical data models in widespread use
– Ted Codd defines the relational data model
• Win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley begins Ingres prototype
– High-performance (for the era) transaction processing
5
History of Database Systems (cont.)
• 1980s:
– Research relational prototypes evolve into commercial systems
• SQL becomes industrial standard
– Parallel and distributed database systems
– Object-oriented database systems
• 1990s:
– Large decision support and data-mining applications
– Large multi-terabyte data warehouses
– Emergence of Web commerce
• Early 2000s:
– XML and XQuery standards
– Automated database administration
• Later 2000s:
– Giant data storage systems
6
• Google BigTable, Yahoo PNuts, Amazon, ..
Database Applications
• DBMS contains information about a particular
enterprise. An environment that is both convenient
and efficient to use
– Banking: all transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Online retailers: order tracking, customized
recommendations
– Manufacturing: production, inventory, orders, supply chain
– Human resources: employee records, salaries, tax
deductions
• Databases touch all aspects of our lives 7
Database Management System
• Why are DBMS so important?
• Consider a possible way of storing customer and
saving accounts information:
– text files for the data
– executable programs written in popular programming
languages
PrintBal.exe
AddAC.exe AddCust.exe
SavingAC.txt CustInfo.txt
Debit.exe Credit.exe
8
Difficulties
• Data Redundancy
– Same information duplicated in different files
– higher storage and access costs
• Data Inconsistency
– copies of the same data don’t agree
– which copy is right?
AddAC.exe AddCust.exe
Customer Name Customer Name
Customer Phone Customer Address
Account Num Customer Phone
Amount Customer Email
SavingAC.txt CustInfo.txt
9
Difficulties
• Difficulty in accessing data
– Find all customers in Houston sdfsdfsdfwefwefwefwf
sdfsdfsdfdwwefwefwef
fsdfssdfsdfwefwefwef
sdfweffwefwefwefwef
sdffsdffsdfsdfwefwefw
sdfsdfsddfsdwefwefwe
fsdfsdfsdfsdfwefwefwe
sdfsdfsdfsdfsdffwefwef
sdfsdfsdfsdfsdfwefwefw
sdfsdffsdfsdfsdfwefwef
sdfsdfsdffsdfsdfsdfsdf
GetAllCust.exe sdfsfdfsfwef3rwefwefw
CustInf.txt
GetCustLoc.exe
Get < 1000.exe
10
Difficulties
• Data Isolation
– Since, data are scattered in various files, and files
may be in different format, writing new
application programs to retrieve appropriate data
is difficult.
SavingAC.txt CustInfo.txt CheckAC.txt
R1;R2;R3; R1,R2,R3, R1 R2 R3
R4;R5;R6; R4,R5,R6, R4 R5 R6
R7;R8 R7,R8 R7 R8
11
Difficulties
• Integrity Problems
– data values stored in the database must satisfy
consistency constraints
• Example:
– Account balance < $100
– Probation student can’t take more than 2 courses
– enforced by code in programs
– new constraints? modify existing constraints?
– must re-code, re-compile etc.
– more difficult when constraints involve several
data items from different files
12
Difficulties
• Atomicity
– if failure happens, data must be restored to the consistent
state prior to failure
– operation must be atomic i.e. happen fully or not at all
– difficult to ensure atomicity in a traditional file-processing
system
BOOM
SavingAC.txt CheckAC.txt
Transfer.exe
-$50 +$50
13
Balance wrong in checking a/c
Difficulties
• Concurrent-Access Done correctly
Balance Correct
-$50 -$100
WithD.exe WithD.exe
SavingAC.txt
Sees balance = $500 Balance = $500 Sees balance = $500
Sees balance = $450
Balance = $450
Balance = $350
14
Difficulties
• Concurrent-Access done incorrectly
Balance Incorrect
-$50 -$100
WithD.exe WithD.exe
SavingAC.txt
Sees balance = $500 Sees balance = $500
Balance = $500
Balance = $450 or $400
15
Difficulties
• Security
– Every user of the database system should not be
able to access all the data.
– prevent authorized users from accessing data
they don’t need
– E.g: Banking System:
• Application programs are added in an ad hoc
manner
• Enforcing security constraints is difficult
16
Difficulties
• These problems are not just found in file-
processing type databases
• Also found in DBMS
• However, DBMS have facilities to help
overcome these difficulties
• Database systems offer solutions to all the
above problems
• These facilities coupled with sound database
design help to overcome these issues
17
Data Abstraction
• One purpose of a DBMS is to provide users with
abstract views of the data
• Efficiency = complex data structures
• This complexity can be hidden through different
‘views’; of the data
• Several layers of abstraction
– Physical level:
• describes how a record (e.g., customer) is stored.
– Logical level:
• describes what data is stored in database, and the
relationships among the data.
– View level:
• application programs hide details of data types.
• views can also hide information (such as an employee’s
salary, SSN) for security purposes. 18
Data Abstraction
Only certain data visible
View Level
19
Data Abstraction
typedef struct {
View
int cusnum;
char *cusname; Logical
char *cusaddress;
} customer; Physical
Stored as series
of bytes
20
Schemas and Instances
Overall design, logical The actual content of
structure, of database is the database at a
known as the schema particular point of time
is know as an instance
25
Data Models
• Entity-Relationship Model
– based on perception of the real world
– translate the enterprise into a blueprint
– consists of:
• entities
• attributes
• relationships
• constraints
– cardinality ratio
– participation constraints
– existence dependency
26
Data Models
ATTRIBUTES
Customer Saving
Relationship
Account
Name
ENTITY
Address ATTRIBUTES
001-223-984 A/C number
Phone ENTITY Balance
Number
27
Data Models
name phone-number
address
acc-number balance
29
Data Models
• Object-based data Model
– Object-relational Model
• An object-relational database (ORD), or object-
relational database management system
(ORDBMS), is a database management system
(DBMS) similar to a relational database, but with
an object-oriented database model:
– objects, classes and inheritance are directly supported
in database schemas and in the query language.
• An object-relational database can be said to provide
a middle ground between relational databases
and object-oriented databases (object database).
30
Data Models
• Record-Based Logical Models
– describes data at logical and view levels
– also provides description of the
implementation
– structured in fixed-format records
• relational model
• network model
• hierarchical model
31
Data Models
• Relational Model Columns
32
Data Models
• Network Model
– collection of records (C/C++ structure)
– each instance of record contains a single record
– relationships represented as links (pointers)
33
Data Models
• Hierarchical Model
– similar to network model
– records organized as collections of trees
2665 2000
34
Semistructured Data model
XML: Extensible Markup Language
• Defined by the WWW Consortium (W3C)
• Originally intended as a document markup
language not a database language
• The ability to specify new tags, and to create nested
tag structures made XML a great way to exchange
data, not just documents
• XML has become the basis for all new generation
data interchange formats.
• A wide variety of tools are available for parsing,
browsing and querying XML documents/data
35
Database Languages
DBMS provide two types of language
– One to specify schema and create the database
– One to express database queries and updates
1. Data-Definition Language
– Schema is specified by a set of definitions expressed by the DDL
– Result is set of tables stored in the Data Dictionary
– Data Dictionary is a file that contains metadata, data about data
2. Data-Manipulation Language
– Language for accessing and manipulating the data organized by
the appropriate data model. That is, data retrieval, insertion,
deletion, modification
– DML also known as query language
– Physical level efficiency
– Higher levels ease of use 36
Database Languages
• Example of SQL DDL
create table student
(id char(10) not null,
name varchar(30) not null,
degree varchar(10),
address char(50),
primary key (id),
check (degree in (“Bachelors”, “Masters”, “Doctorate”)))
• DDL compiler generates a set of table templates stored in a data
dictionary. Data dictionary contains metadata (data about data)
– Database schema
– Integrity constraints
• Primary key (ID uniquely identifies instructors)
– Authorization
• Who can access what 37
Database Languages
Two classes of languages
– Procedural – user specifies what data is required
and how to get those data.
– Declarative (nonprocedural) – user specifies what
data is required without specifying how to get those
data
SQL: widely used non-procedural query language
Example of SQL DML:
Query-1:
Select name, id
From student
Where degree = ‘Bachelors’;
Query-2:
Select name, address, degree
From student
Where id = ‘021000040’; 38
Database Languages
Popular database language
SQL (Structured Query Language)
Quel
Datalog
QBE (Query by Example)
39
Database Users
Administrator (DBA)
Coordinates all the activities of the database system; the
database administrator has a good understanding of the
enterprise’s information resources and needs.
– schema definition
– storage structure
– granting authorization for data access
– schema and physical organization modification
– integrity-constraint specification
– routine maintenance
• Database Backup
• Ensure free space
• Performance of the Database
– Eliminate expensive tasks 40
Database Users
Application Programmers
– interact with system through DML calls
embedded in a host language
– uses DML pre-compiler
– may also interact with databases through
bridges e.g. ODBC, JDBC
Sophisticated Users
– interact with database using database query
language
– submitted to the query processor
41
Database Users
Specialized Users
– sophisticated users who write specialized database
applications
• expert systems
• graphical databases
Naïve Users
– unsophisticated users who interact with database system
through one or more application programs
• Example: bank teller who needs to transfer $50 from account A
to account B.
• people accessing database over the web, bank tellers, clerical
staff
42
Transaction Management
• A transaction is a collection of operations that
performs a single logical function in a database
application
• Transaction-management component ensures
that the database remains in a consistent (correct)
state despite system failures (e.g., power failures
and operating system crashes) and transaction
failures.
• Concurrency-control manager controls the
interaction among the concurrent transactions, to
ensure the consistency of the database.
43
Storage Management
• Storage manager is a program module that provides
the interface between the low-level data stored in the
database and the application programs and queries
submitted to the system.
• The storage manager is responsible to the following
tasks:
– Interaction with the file manager
– Efficient storing, retrieving and updating of data
• Issues:
– Storage access
– File organization
– Indexing and hashing 44
Storage Management
• Storage Manager Components
– Authorization and Integrity Manager
• tests integrity constraints and checks user authorization
– Transaction Manager
• ensures database remains in consistent state despite system failure
and usually incorporates concurrency-control manager which
controls the interaction among the concurrent transactions, to
ensure the consistency of the database.
– File Manager
• manages allocation of space on-disk storage
– Buffer Manager
• responsible for swapping data from disk storage to main memory
• decides what data to cache
45
Database Architecture
46
Overall System Structure
Query Processor Components
– DML Compiler
• translates DML statements into low-level instructions the query
evaluation engine understands
• may try to optimize user queries
– Embedded DML Pre-compiler
• converts DML statements embedded in an application program to
normal calls in the host language
• interacts with DML Compiler
– DDL Interpreter
• interprets DDL statements and records them in the data dictionary
– Query Evaluation Engine
• executes low-level instructions generated by the DML Compiler
47
Overall System Structure
48
Thank You All For Your
Attention
49