01 Cs245 HS23 Intro To Databases Prologue Lecture
01 Cs245 HS23 Intro To Databases Prologue Lecture
This chapter will introduce why databases are key elements in the software
stack of information systems – and why using databases has significant
benefits over other, simpler approaches to data management.
You will …
• get to know the general architecture of database systems
• understand how logical and physical data independence are provided
and why they are essential features of database systems
• be able to distinguish the intensional from the extensional layer
in a database system, i.e., schema vs. data
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–2
Why do we need Databases?
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–3
Prologue – Why Databases?
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–4
1.1 Data Management in Main Memory –
an Alternative to Databases?
• When you create a program, usually only main memory is used for data
management, i.e., objects are created in memory (and are automatically
deleted when the program execution ends)
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–5
Persistent Storage
• The external storage (hard disks) of a computer system offers such possibility
of permanent storage
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–6
Structure of a Hard Disk
• Several magnetic disks rotate (e.g., with 7'200 rotations per minute) around
a common spindle
spindle
actuator
platters
arm
• Which activities are needed to read data from the hard disk to write data
to the hard disk?
1. Position the actuator with the read/write heads on top of the cylinder
2. Wait until the disk rotation has brought the start of the sector to be
accessed underneath the read/write head
3. Transmit the information from the hard disk to main memory
(or vice versa)
• But:
– It does not make sense (and it is also technically infeasible) to read or
write single bytes. Instead, at least a whole sector must read / written.
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–9
Prologue – Why Databases?
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–10
1.2 Data Management in Files –
an Alternative to Databases?
• Addressing data should be transparent to the user as they should not care
about technical details like disk no., surface no., etc.
• Working with files significantly facilitates storing / reading data:
– File names simplify addressing data
– Directory hierarchies support the grouping of files
– Memory cells of a file are byte-wise enumerated, starting with address 0
– Input from / output to files is buffered, so that a programmer does
not need to read / write complete sectors
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–11
Example: File Access in Java …
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–12
… Example: Data Access in Java
• In Java, there are several classes and methods for input/output to/from files
• When objects are written to a file by an application, the file format needs
to be specified by the programmer:
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–13
File Format
F e d e r e r R o g e r 1 9 8 1 W
a w r i n k a S t a n 1 9 8 5 . .
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–14
Consequences
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–15
Physical Data Dependence
• Usually, data records are addressed and separated from each other on
the basis of their position:
– For example: in case it is known that each record has 22 characters;
then …
– … the first record starts with address 0
– … the second record starts with address 22, etc.
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–16
Information Systems
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–17
Redundancy
• The use of files often implies that data are stored several times
(in separate files)
Products
Accounting
Addresses
Products
Warehouse Management
Orders
Orders
Order Management Products
Addresses
• Consequence:
– Redundancy: the same information is available several times at different
locations
– Update anomalies: when some information is changed, several other files
have to be searched for the same entries (they have to be updated as well)
– Inconsistency: in case not all copies are changed as well, the entire system
becomes inconsistent (i.e., it contains conflicting data)
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–18
Interface Problems
File 1 Products
Accounting
• Drawbacks:
– This alternative is very confusing, especially when data in the different
files are structured differently
– In case of logical and/or physical changes of the file schema, many
programs have to be updated
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–19
Additional “Problems” when using Files
• In large information systems, several users work concurrently with the same
data
– File systems only provide limited support to synchronize these accesses
– Users that work concurrently with the same data must not introduce
inconsistencies
• File systems do not provide full protection against data loss in case of system
crashes and malfunctioning hardware
– Failures of the system and/or the application programs need to be handled
properly
• File systems only provide inflexible access control (data protection)
– In case users are only allowed to see parts of the entire data within a file,
this can only hardly be enforced
– At the same time, guaranteeing the integrity of data (i.e., are the data
compliant to certain rules) is difficult as well
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–20
Prologue – Why Databases?
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–21
1.3 Databases
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–22
Components
• Terminology
Database applications
App1 App2
(communicate with
the DBMS)
DBMS: Database
management system
(software for managing data)
DB
DB: Database
(actual data collection)
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–23
Examples of Databases/Information Systems (DB/IS)
University IS Airline IS
Geographical IS
Personal IS
Library IS
Banking IS
Social Media IS
Healthcare IS
Web Shop IS
Railway IS
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–24
Tasks of a DBS
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–25
Requirements on a DBS …
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–26
… Requirements on a DBS
• Access Control:
avoid unauthorized accesses
• Transactions:
a sequence of update operations is combined to an entity whose effect is
stored permanently in the DB (if executed without failure)
• Synchronization:
in case several users work concurrently with the database, then the DBMS
guarantees that no unintentional interferences occur
• Data Protection:
the DBS supports the recovery of data after system failures (i.e., system
crashes) or media failures (e.g., erroneous hard disk) – in contrast to file
backup, restoring the state after the last successful transaction is required
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–27
What is stored in a Database? …
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–28
… What is stored in a Database?
A simple example
• Schema (intensional):
Person
Name (10 characters) FirstName (8 chars.) Year (4 chars.)
• DB state (extensional):
Person
F e d e r e r R o g e r 1 9 8 1
W a w r i n k a S t a n 1 9 8 5
• A database system not only stores the database state but also the database
schema.
• This allows to enforce that only „correct“ data (w.r.t. the schema) is stored
in the database
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–29
Schema in Databases
• The schema is …
– … explicitly modeled (in form of a text document, graphical, …)
– … stored in the database
• The DBMS monitors (and enforces) the compliance of the database state
w.r.t. the database schema
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–30
Products
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–32
Use of a DBS
public class
EmpMgmt {
public static
void main () { Application program
...
insert(Employee, (Software Developer)
"Smith");
...
}}
DBAdmin x
select * from Employee ;
EmpNo Name FirstName Which Backup?
1 Smith Eric DBS Full Daily
2 Miller Judy
3 Baker Charles Partial Weekly
select OK Cancel
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–34
Abstract Architecture of a DBS
A1 A2 A3 A4 A5 Groups of applications
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–35
Conceptual Layer
• … and contains
– the description of all object types and their relationships
– but no details on how these are physically stored
• The schema is expressed in the data model of the database system and is
specified using the DBS‘s data definition language (DDL)
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–36
Internal Layer
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–37
External Layer
• The external layer contains the collection of the individual views of all user
and application groups in (potentially several) external schemas
• A user is not supposed to see data, (s)he does not want to see (clarity)
or is not allowed to see (privacy, data protection)
– Example: in a hospital, the nursing staff needs different data on a patient
than the accounting department
• This allows to decouple the database from changes and extensions in the
interfaces to users/applications (logical data independence)
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–38
Physical and Logical Data Independence
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–39
Recap: Questions
• What are the reasons for using databases (and not relying on main memory
and/or the file system) for storing data?
• What is the difference between the intensional and the extensional layer of
a database?
• What is the difference between a Data Definition Language (DDL) and
a Data Manipulation Language (DML)?
• Why are logical and physical data independence needed?
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–40