0% found this document useful (0 votes)
17 views40 pages

01 Cs245 HS23 Intro To Databases Prologue Lecture

The document introduces why databases are important for managing large amounts of data. As data volumes continue to grow exponentially, efficiently structuring, storing, and accessing data is critical. While main memory and file-based approaches were initially used, they have limitations. Databases provide benefits like logical and physical data independence, allowing data schemas and storage structures to change without affecting applications. They also enable optimized, intuitive access to large data volumes with correctness even during parallel access.

Uploaded by

4r4fvxz4k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views40 pages

01 Cs245 HS23 Intro To Databases Prologue Lecture

The document introduces why databases are important for managing large amounts of data. As data volumes continue to grow exponentially, efficiently structuring, storing, and accessing data is critical. While main memory and file-based approaches were initially used, they have limitations. Databases provide benefits like logical and physical data independence, allowing data schemas and storage structures to change without affecting applications. They also enable optimized, intuitive access to large data volumes with correctness even during parallel access.

Uploaded by

4r4fvxz4k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Chapter 1 – Prologue: Why Databases?

cs245 Introduction to Databases (Fall 2023)


H. Schuldt
Objectives

This chapter will introduce why databases are key elements in the software
stack of information systems – and why using databases has significant
benefits over other, simpler approaches to data management.

You will …
• get to know the general architecture of database systems
• understand how logical and physical data independence are provided
and why they are essential features of database systems
• be able to distinguish the intensional from the extensional layer
in a database system, i.e., schema vs. data

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–2
Why do we need Databases?

• By 2025, there will more than 180 Zettabytes (180ÿ1021) of data


in the digital universe [source:
https://www.statista.com/statistics/871513/worldwide-data-created/]
• Through the proliferation of (software and hardware) sensors, mobile devices,
cameras, etc., which continuously generate data, digital information is subject
to a very rapid growth
• This increasingly also concerns data available on the Internet, especially
in social networks
• An important task is therefore to cope with this flood of information.
This includes:
– Structuring, storing, and managing data
– Optimized (faster) access to these data with powerful (and intuitive,
i.e., declarative) interfaces
– Correctness when accessing (reading and updating) data in parallel

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–3
Prologue – Why Databases?

1.1 Data Management in Main Memory?


1.2 Data Management in Files?
1.3 Databases!

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–4
1.1 Data Management in Main Memory –
an Alternative to Databases?

• When you create a program, usually only main memory is used for data
management, i.e., objects are created in memory (and are automatically
deleted when the program execution ends)

• However, this is not sufficient


– In many applications, data must be permanently (persistently) stored,
i.e., data must ‘survive’ system crashes and endure the execution of
a program
– Main memory is often not large enough to hold all the data to be managed
(e.g., all customer data of a bank, patient records in a hospital, gene
sequence data of a life science laboratory, etc.)

– (An exception are novel approaches to so called “in memory databases” –


but also in this case, the persistence problem needs to be solved)

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–5
Persistent Storage

• The external storage (hard disks) of a computer system offers such possibility
of permanent storage

Main memory: External storage:


– Purely electronic (transistors – Storage on magnetic disks
and capacitors) (rotating)
– Volatile – Non-volatile
– Fast: 8 ns/access, – Slow: 5-10 ms/access,
i.e., 8·10−9 s per access i.e., about 8·10−3 s per access
– Random access – Block-wise access
– Expensive: – Significantly cheaper:
< 120 CHF for 16 GB á 100 CHF for 4 TB

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–6
Structure of a Hard Disk

• Several magnetic disks rotate (e.g., with 7'200 rotations per minute) around
a common spindle
spindle

actuator
platters
arm

• An actuator arm with two read/write


magnetic read/write heads heads
Motor
per disk (above/below) moves
in radial direction
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–7
Structure of the Disk Surfaces

• There are several cylinders per disk surface Sectors


(circular arrangement of bits)
• Per cylinder, there are several
sectors
Cylinders

• (internal) address of a piece of information:


[disc no. | surface no. | cylinder no. | sector no. | byte no.]
• Calculation of capacity:
#discs · 2 · #cylinders · #sectors · bytes-per-sector
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–8
Reading/Writing a Sector

• Which activities are needed to read data from the hard disk to write data
to the hard disk?
1. Position the actuator with the read/write heads on top of the cylinder
2. Wait until the disk rotation has brought the start of the sector to be
accessed underneath the read/write head
3. Transmit the information from the hard disk to main memory
(or vice versa)

• But:
– It does not make sense (and it is also technically infeasible) to read or
write single bytes. Instead, at least a whole sector must read / written.

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–9
Prologue – Why Databases?

1.1 Data Management in Main Memory?


1.2 Data Management in Files?
1.3 Databases!

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–10
1.2 Data Management in Files –
an Alternative to Databases?

• Addressing data should be transparent to the user as they should not care
about technical details like disk no., surface no., etc.
• Working with files significantly facilitates storing / reading data:
– File names simplify addressing data
– Directory hierarchies support the grouping of files
– Memory cells of a file are byte-wise enumerated, starting with address 0
– Input from / output to files is buffered, so that a programmer does
not need to read / write complete sectors

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–11
Example: File Access in Java …

public static void main (String[] args) {


try { file handle
RandomAccessFile f1 = new
RandomAccessFile("test.file","rw"); open file
int c = f1.read() ; read a single byte
long new_position = .... ;
f1.seek (new_position) ; at new position …
f1.write (c) ; … write a byte
f1.close () ; close file
} catch (IOException e) { failure handling
System.out.println ("Error: " + e) ;
}
}

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–12
… Example: Data Access in Java

• In Java, there are several classes and methods for input/output to/from files

• When objects are written to a file by an application, the file format needs
to be specified by the programmer:

Name (10 characters) FirstName (8 chars.) Year (4 chars.)


F e d e r e r R o g e r 1 9 8 1

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–13
File Format

• In files, it is not mandatory to explicitly model the data format


– The schema is implicitly contained in the procedures to input/output data
– This format is part of the documentation of a program …
– … or it needs to be extracted out of the program code (reverse engineering).

• Failures in the input/output procedures can make the entire data


completely unusable:

F e d e r e r R o g e r 1 9 8 1 W
a w r i n k a S t a n 1 9 8 5 . .

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–14
Consequences

• In many applications, a revision of the file format is occasionally necessary


(e.g., additional attributes of an object in a new version of the program):
– In this case, old files cannot be used anymore or have to be converted by
dedicated programs (to make sure they are compliant to the new format)
– These changes have then to be propagated to all other programs that use
these data, even though these programs might not be logically affected by
the change in file format.

Ø This leads to a logical data dependence

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–15
Physical Data Dependence

• Usually, data records are addressed and separated from each other on
the basis of their position:
– For example: in case it is known that each record has 22 characters;
then …
– … the first record starts with address 0
– … the second record starts with address 22, etc.

• Searching for a particular attribute value (e.g., the name of a customer)


has to be coded in the program.

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–16
Information Systems

• Information Systems are usually large software systems that consist of


several different programs
• These programs work in parts with shared data, in parts with different data
• Examples for such programs:
– Accounting: information on products and addresses
– Warehouse management: products and orders
– Order management: orders, products, addresses
– CAD system: products, technical data, modules
– production, order entry, calculation: ...

– Patient management: addresses, treatments, etc.


– Warehouse management: products and requests
– Resource management: rooms, devices, staff
– Accounting: products, addresses
– …

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–17
Redundancy

• The use of files often implies that data are stored several times
(in separate files)
Products
Accounting
Addresses
Products
Warehouse Management
Orders
Orders
Order Management Products
Addresses
• Consequence:
– Redundancy: the same information is available several times at different
locations
– Update anomalies: when some information is changed, several other files
have to be searched for the same entries (they have to be updated as well)
– Inconsistency: in case not all copies are changed as well, the entire system
becomes inconsistent (i.e., it contains conflicting data)
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–18
Interface Problems

• Alternative implementation: access to several files from an application program

File 1 Products
Accounting

Warehouse Management File 2 Addresses

Order Management File 3 Orders

• Drawbacks:
– This alternative is very confusing, especially when data in the different
files are structured differently
– In case of logical and/or physical changes of the file schema, many
programs have to be updated
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–19
Additional “Problems” when using Files

• In large information systems, several users work concurrently with the same
data
– File systems only provide limited support to synchronize these accesses
– Users that work concurrently with the same data must not introduce
inconsistencies
• File systems do not provide full protection against data loss in case of system
crashes and malfunctioning hardware
– Failures of the system and/or the application programs need to be handled
properly
• File systems only provide inflexible access control (data protection)
– In case users are only allowed to see parts of the entire data within a file,
this can only hardly be enforced
– At the same time, guaranteeing the integrity of data (i.e., are the data
compliant to certain rules) is difficult as well

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–20
Prologue – Why Databases?

1.1 Data Management in Main Memory?


1.2 Data Management in Files?
1.3 Databases!

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–21
1.3 Databases

• All these problems can be addressed with a consistent approach using


databases

With file system: With a database system:

Application1 Application2 Application1 Application2

File1 File2 File3 Database System

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–22
Components

• Terminology

Database applications
App1 App2
(communicate with
the DBMS)

DBS: Database system


(DB + DBMS)
DBMS

DBMS: Database
management system
(software for managing data)

DB
DB: Database
(actual data collection)

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–23
Examples of Databases/Information Systems (DB/IS)
University IS Airline IS
Geographical IS
Personal IS

Library IS

Banking IS

Social Media IS
Healthcare IS

Web Shop IS
Railway IS
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–24
Tasks of a DBS

• Primary task of a DBS is ...


– … to structure and describe
– … to store and maintain
– … and to retrieve from (i.e., to answer queries)
considerably large data sets that are used (persistently) by
different application programs

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–25
Requirements on a DBS …

List of “9 commandments” (according to Edgar F. Codd, 1982)


• Integration:
uniform management of all data needed by applications.
Storage of the entire data collection free of redundancy.
• Operations:
the DBS provides operation for storing, retrieving, and
manipulating data
• Data Dictionary:
a catalog supports access to the description of data
• Individual User Views:
for different applications, different views to the entire data collection
can be provided
• Monitoring Consistency:
the DBMS monitors and enforces the correctness of data in case of changes

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–26
… Requirements on a DBS

• Access Control:
avoid unauthorized accesses
• Transactions:
a sequence of update operations is combined to an entity whose effect is
stored permanently in the DB (if executed without failure)
• Synchronization:
in case several users work concurrently with the database, then the DBMS
guarantees that no unintentional interferences occur
• Data Protection:
the DBS supports the recovery of data after system failures (i.e., system
crashes) or media failures (e.g., erroneous hard disk) – in contrast to file
backup, restoring the state after the last successful transaction is required

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–27
What is stored in a Database? …

There is a distinction of two layers:


• Intensional layer: the database schema
– Describes the possible content of the DB
– Structure of the data, data types (meta data)
– The type of description is defined by the data model
– Changes are possible in principle, but only occur infrequently
(called schema evolution)
• Extensional layer: the extent of the database
– The actual content (database state)
– information on objects, attribute values, etc.
– Structure is defined by the database schema
– Changes occur very frequently
(e.g., flight booking system: à 10‘000 transactions/min)

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–28
… What is stored in a Database?

A simple example
• Schema (intensional):
Person
Name (10 characters) FirstName (8 chars.) Year (4 chars.)

• DB state (extensional):
Person
F e d e r e r R o g e r 1 9 8 1
W a w r i n k a S t a n 1 9 8 5
• A database system not only stores the database state but also the database
schema.
• This allows to enforce that only „correct“ data (w.r.t. the schema) is stored
in the database

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–29
Schema in Databases

• The schema is …
– … explicitly modeled (in form of a text document, graphical, …)
– … stored in the database

• Users can query information on the schema from the database


– using the co-called Data Dictionary (meta data)

• The DBMS monitors (and enforces) the compliance of the database state
w.r.t. the database schema

• Changing the schema is supported by the DBMS (schema evolution, migration)

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–30
Products

• Relational database systems (see Chapters 3 and 4):


– Oracle
– IBM DB2
– Microsoft SQL Server
– MySQL
– PostgreSQL ...
• Non-relational database systems:
– UDS: network database systems (Siemens)
– IMS: hierarchical database systems (IBM)
• NoSQL database systems (some will be briefly introduced in Chapter 6)
– BigTable: column store (Google)
– MongoDB: document store
– Redis: key/value store
– Neo4j: graph database
– …
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–31
Relational and Non-Relational Databases

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–32
Use of a DBS

public class
EmpMgmt {
public static
void main () { Application program
...
insert(Employee, (Software Developer)
"Smith");
...
}}

DBAdmin x
select * from Employee ;
EmpNo Name FirstName Which Backup?
1 Smith Eric DBS Full Daily
2 Miller Judy
3 Baker Charles Partial Weekly
select OK Cancel

Interactive front-end for Use of pre-defined


ad-hoc queries applications

• From a technical perspective, the interactive front-end is also an application


program that uses the DBS
Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–33
Database Languages

• Data Definition Language (DDL)


– Declarations to describe the schema of a database
– In the case of relational databases:
create and delete tables, specify integrity constraints, etc.

• Data Manipulation Language (DML)


– Commands to work with the content (data) of a database
– Can be further subdivided into commands
• to access the database state, i.e., read from the DB (query language)
• to manipulate the database state (insert, update, delete)

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–34
Abstract Architecture of a DBS

• The abstract architecture of a DBS (according to ANSI/SPARC) considers


three layers to guarantee
– physical data independence and
– logical data independence

A1 A2 A3 A4 A5 Groups of applications

Ext. Schema1 Ext. Schema2 Ext. Schema3 External Layer

Logical Data Independence

Logical Schema Conceptual (Logical) Layer

Physical Data Independence

Internal Schema Internal (Physical) Layer

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–35
Conceptual Layer

• The conceptual (logical) layer provides a holistic view, from a logical


perspective on all data
• This holistic view is …
– independent of the different applications
– materialized in the conceptual (logical) schema
– result of the (conceptual) database design (more details: see Chapter 2)

• … and contains
– the description of all object types and their relationships
– but no details on how these are physically stored

• The schema is expressed in the data model of the database system and is
specified using the DBS‘s data definition language (DDL)

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–36
Internal Layer

• The internal schema at the physical layer describes the system-specific


implementation of the database objects (physical storage), for instance
– Structure of the data records
– Index structures such as search trees
• The internal schema is essential for the run-time characteristics of
the entire DBS (this will be addressed in the Database Systems course)
• Applications are not affected by changes of the internal schema
(physical data independence)

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–37
External Layer

• The external layer contains the collection of the individual views of all user
and application groups in (potentially several) external schemas
• A user is not supposed to see data, (s)he does not want to see (clarity)
or is not allowed to see (privacy, data protection)
– Example: in a hospital, the nursing staff needs different data on a patient
than the accounting department
• This allows to decouple the database from changes and extensions in the
interfaces to users/applications (logical data independence)

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–38
Physical and Logical Data Independence

• Physical Data Independence


– A modification of the physical storage architecture (e.g., adding dedicated
data structures like search trees to better support queries) does not have
any effects on the logical layer (it is left invariant), hence does not influence
the database schema

• Logical Data Independence


– Changes of the logical schema (e.g., by adding a new attribute to an object)
can be hidden to the application by introducing an appropriate view
– However, this independence might not be guaranteed for all types of
modifications of the conceptual schema

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–39
Recap: Questions

• What are the reasons for using databases (and not relying on main memory
and/or the file system) for storing data?
• What is the difference between the intensional and the extensional layer of
a database?
• What is the difference between a Data Definition Language (DDL) and
a Data Manipulation Language (DML)?
• Why are logical and physical data independence needed?

Fall 2023 Introduction to Databases (cs245) – Prologue: Why Databases? – Heiko Schuldt 01–40

You might also like