Lesson 1 - Database System Overview
Lesson 1 - Database System Overview
Learning Outcomes
Outline
1. Database-System Applications
2. Purpose of Database Systems
3. View of Data
4. Database Languages
5. Database Design
6. Database Engine
7. Database Architecture
8. Database Users and Administrators
Database Systems
• Enterprise Information
o Sales: Managing customer information, products, and purchases.
o Accounting: Handling payments, receipts, and assets.
o Human Resources: Storing and managing information about employees, salaries, and payroll
taxes.
• Manufacturing
o Managing production processes, inventory, orders, and supply chains.
• Universities
o Managing student registration and grades.
• Airlines
o Managing reservations and flight schedules.
• Telecommunication
o Recording calls, texts, and data usage.
o Generating monthly bills and maintaining balances on prepaid calling cards.
• Web-Based Services
o Online Retailers: Tracking orders and providing customized recommendations.
o Online Advertisements: Managing ad placements and targeting.
In the early days, database applications were built directly on top of file systems, which led to several issues:
• Data Isolation
o Data was scattered across multiple files and formats, making it difficult to combine and use
efficiently.
• Integrity Problems
o Integrity Constraints: For example, rules like "account balance > 0" were often buried in
program code rather than being explicitly stated.
o Difficulty in Modifications: Adding new constraints or modifying existing ones was challenging
because they were hard-coded into applications.
• Atomicity of Updates
o Failures may leave the database in an inconsistent state with only partial updates carried out.
Example: A fund transfer from one account to another should either be fully completed or not
occur at all to maintain consistency.
Example: Two people reading a balance of 100 and both withdrawing 50 at the same time
could cause errors if not properly managed.
• Security Problems
o It is difficult to provide user access to only specific parts of data, leading to potential security
vulnerabilities.
• Solution
o Modern database systems are designed to address and solve these problems, providing robust
solutions for atomicity, concurrency, and security.
View of Data
• A database system is a collection of interrelated data and a set of programs that allow users to access
and modify these data.
• The primary purpose of a database system is to provide users with an abstract view of the data,
simplifying interaction with complex data structures.
• Components
o Data Models - A collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints.
o Data Abstraction - Hides the complexity of data structures by representing data in the database
through several levels of abstraction, making it easier for users to interact with the data without
needing to understand the underlying complexities.
Data Models
Data models are essential tools used for describing and managing data within a database system. They
encompass several key aspects:
o Relational Model - Organizes data into tables (relations) where each table represents an entity.
o Entity-Relationship Data Model - Primarily used for database design, this model focuses on
entities and their relationships.
In a database system, there are three levels of abstraction that help manage the complexity of data:
1. Physical Level
o Describes how a record (e.g., an instructor) is stored in the database.
2. Logical Level
o Describes the data stored in the database and the relationships among the data.
Example:
3. View Level
o Application programs hide details of data types from the user.
o Views can also hide specific information (such as an employee’s salary) for security purposes.
• Logical Schema - The overall logical structure of the database, like a blueprint.
o Example - Consider a banking database where the logical schema defines entities like
"Customers" and "Accounts," and the relationships between them, such as which customers
hold which accounts.
• Instance - The specific content of the database at a particular point in time, akin to the current value
of a variable.
o Example - At a given moment, the instance could include all customer details and their
corresponding account balances as they exist right now.
• Physical data independence is the ability to modify the physical schema (how data is stored) without
affecting the logical schema (how data is structured logically).
• Key Points
o Applications depend on the logical schema
▪ This means that even if the physical storage of data changes (like moving from one
type of storage system to another), the applications that use the database won’t need
to change as long as the logical schema remains the same.
• DDL Compiler - The DDL compiler processes these definitions and stores the table templates in a
data dictionary.
• Data Dictionary
o Contains metadata—data about the data, such as:
▪ Database Schema - The structure of the database.
▪ Integrity Constraints - Rules to ensure data accuracy, such as a primary key that
uniquely identifies each instructor.
▪ Authorization - Information about who has access to what data.
• DML is used for accessing and updating the data organized by the appropriate data model. It is also
commonly referred to as a query language.
• Types of DML
o Procedural DML - Requires the user to specify what data is needed and how to get that data.
o Declarative DML - Requires the user to specify what data is needed without needing to specify
how to retrieve it.
• Key Points
o Declarative DMLs are usually easier to learn and use compared to procedural DMLs.
o Declarative DMLs are also known as non-procedural DMLs.
o The portion of a DML that involves information retrieval is specifically called a query language.
• SQL is a nonprocedural query language. A query in SQL takes one or more tables as input and always
returns a single table.
Example:
SELECT name
FROM instructor
WHERE dept_name = 'Comp. Sci.';
This query retrieves the names of all instructors in the Computer Science department.
Key Points
o SQL is NOT a Turing machine equivalent language - It cannot perform all computational tasks
by itself.
o To perform more complex operations, SQL is often embedded in a higher-level programming
language.
o Application programs typically access databases using -
▪ Language extensions - Allow embedding SQL directly within another programming
language.
▪ Application Program Interfaces (APIs) - Such as ODBC or JDBC, which allow SQL
queries to be sent to the database.
• Non-procedural query languages like SQL are not as powerful as a universal Turing machine, meaning
they can't handle all computational tasks.
• Application Programs
o Programs that interact with the database in this way are known as application programs.
They combine the strengths of a host language and SQL to provide a full range of
functionality.
The process of designing the general structure of the database involves two key aspects:
• Logical Design
o Deciding on the database schema, which requires identifying a "good" collection of relation
schemas.
▪ Business Decision - Determining what attributes (pieces of data) should be recorded in
the database.
▪ Computer Science Decision - Deciding what relation schemas should be created and
how the attributes should be distributed among them.
• Physical Design
o Deciding on the physical layout of the database, including how the data will be stored on disk
and how it will be accessed efficiently.
Database Engine
• A database system is divided into modules, each responsible for a specific part of the overall system.
Functional Components
1. Storage Manager - Handles the storage, retrieval, and update of data in the database.
2. Query Processor Component - Interprets and executes database queries, optimizing them for
efficiency.
3. Transaction Management Component - Ensures that all database transactions are processed
reliably and that the database remains in a consistent state, even in the case of failures.
Storage Manager
• The storage manager is a program module that acts as the interface between the low-level data
stored in the database and the application programs and queries submitted to the system.
• Responsibilities
o Interaction with the OS File Manager - Coordinates with the operating system's file
manager to handle data storage and retrieval.
o Efficient Storing, Retrieving, and Updating of Data - Ensures that data operations are
performed efficiently.
Query Processor
The query processor is responsible for interpreting and executing database queries. It includes the
following components:
• DDL Interpreter
o Interprets Data Definition Language (DDL) statements and records the definitions in the
data dictionary.
• DML Compiler
o Translates Data Manipulation Language (DML) statements into an evaluation plan. This
plan consists of low-level instructions that the query evaluation engine can execute.
o Query Optimization - The DML compiler performs query optimization by selecting the
most cost-effective evaluation plan from various alternatives.
These components work together to process and optimize queries, ensuring efficient retrieval and
manipulation of data.
Query Processing
2. Optimization
o The relational-algebra expression is
optimized by selecting the most
efficient execution plan. The
optimizer uses statistics about the
data to choose the best approach.
3. Evaluation
o The evaluation engine executes the
query based on the optimized
execution plan, resulting in the
query output.
This process ensures that database queries are executed efficiently, minimizing resource use and
speeding up response times.
• Transaction Management
o The transaction management component ensures that the database remains in a
consistent (correct) state despite system failures (e.g., power failures, operating system
crashes) and transaction failures.
• Concurrency Control
o The concurrency-control manager oversees the interaction among concurrent transactions
to ensure the consistency of the database, preventing issues such as data conflicts.
DATABASE ARCHITECTURE
• Centralized Databases
o Typically run on one to a few cores with shared memory. All data is stored and managed in a
single location.
• Client-Server Architecture
o Involves one server machine that executes tasks on behalf of multiple client machines. This
setup allows for efficient resource management and centralized control.
• Parallel Databases
o Utilize multiple cores with shared memory to handle large-scale data processing.
o Types:
▪ Shared Disk - All processors share a single disk for data storage.
▪ Shared Nothing - Each processor has its own disk and memory, eliminating
bottlenecks.
• Distributed Databases
o Geographical Distribution - Data is distributed across different geographical locations.
o Schema/Data Heterogeneity - Different sites may use different schemas and data formats,
requiring integration.
In a centralized or shared-memory database architecture, all components of the database system are
closely integrated and share the same memory space. Here’s how it is structured -
• Query Processor
o Compiler and Linker - Compiles and links application program object code.
o DML Compiler and Organizer - Translates and organizes DML (Data Manipulation Language)
queries for execution.
o DDL Interpreter - Interprets DDL (Data Definition Language) statements, recording them in the
data dictionary.
o Query Evaluation Engine - Executes the optimized query plan.
• Storage Manager
o Buffer Manager - Manages the memory buffer, optimizing data access.
o File Manager - Manages the storage of data on disk.
o Authorization and Integrity Manager - Controls access to data and ensures data integrity.
o Transaction Manager - Ensures that transactions are processed reliably and maintains the
consistency of the database.
• Disk Storage
o Data - The actual data stored in the database.
o Indices - Structures that improve the speed of data retrieval.
o Data Dictionary - Metadata that describes the structure of the database, including tables,
columns, and constraints.
o Statistical Data - Information used by the optimizer to choose the best execution plan.