Lecture1 - Intro. Database systems
Lecture1 - Intro. Database systems
Data
Data refers to raw facts, figures, or observations that have not been processed or
analyzed. Data alone has little value without context or organization. For example,
numbers, dates, or strings like "John", "25", "2022" are data points.
Information
Information is data that has been processed, organized, or structured in a way that
provides meaning or context. It answers questions such as "who", "what", "when", and
"how".
o Example: A report showing "John's age is 25 years" is information derived from
raw data ("John", "25").
Database
A Database Management System (DBMS) is software that enables users to create, manage,
and interact with databases. It provides tools for storing data, retrieving data efficiently, updating
data, and ensuring data integrity and security.
A DBMS provides a systematic and efficient way of handling large amounts of data and ensures
that data remains consistent, secure, and accessible. It provides a framework for managing
complex relationships between data and supports multiple users accessing the database
simultaneously.
Data Storage Management: Efficiently stores data in the database and manages access
to it.
Data Retrieval: Allows users to query and retrieve data quickly through query languages
like SQL.
Data Manipulation: Supports operations like adding, updating, and deleting data in the
database.
Concurrency Control: Manages simultaneous access by multiple users, ensuring data
integrity and preventing conflicts.
Backup and Recovery: Ensures data integrity through regular backups and supports data
recovery in case of failures.
Security Management: Ensures that only authorized users can access or modify data
through authentication and access control mechanisms.
Data Integrity: Enforces rules to ensure the accuracy and consistency of data.
Transaction Management: Ensures that database transactions are processed reliably,
preserving properties like atomicity, consistency, isolation, and durability (ACID
properties).
Components of a DBMS
A DBMS is made up of several components that work together to manage the database:
1. Database Engine: The core component responsible for managing database operations
such as data storage, retrieval, and manipulation.
2. Database Schema: A blueprint or structure that defines the organization of data in the
database, including tables, relationships, constraints, and indexes.
3. Query Processor: Interprets and executes user queries, converting high-level commands
(SQL) into low-level instructions that interact with the database engine.
4. Transaction Manager: Ensures that database transactions follow the ACID properties
and handles concurrency control, ensuring that transactions do not interfere with each
other.
5. Database Security Manager: Manages security policies, user authentication, and
permissions, ensuring that only authorized users have access to the database.
6. Backup and Recovery Manager: Manages the process of taking database backups and
restoring data in case of system failures.
Database System
A Database System is an integrated system for managing databases. It includes the Database
Management System (DBMS), the data itself, user interfaces, and the application programs that
interact with the database. The primary goal of a database system is to efficiently store, manage,
and retrieve data while ensuring data consistency, security, and reliability.
DBMS is the software that controls how data is stored, retrieved, and manipulated. It
provides the framework for managing databases, ensuring data integrity, security, and
concurrency control.
o Examples of DBMS: MySQL, Oracle, SQL Server, PostgreSQL, MongoDB.
2. Database
A Database is the organized collection of data that the DBMS manages. It includes
tables, relationships, views, and other data structures that store information.
o Data in the database: Can be structured (tables with rows and columns), semi-
structured (XML, JSON), or unstructured (text, images).
The Data Dictionary or System Catalog contains metadata about the structure of the
database, such as tables, views, columns, data types, constraints, and relationships. It is
maintained by the DBMS.
4. Query Processor
The Query Processor interprets and executes SQL (or other query languages)
commands. It translates high-level queries into low-level instructions that the DBMS can
use to retrieve and manipulate data.
5. Database Engine
The Database Engine is the core part of the DBMS responsible for storing, retrieving,
and modifying data. It manages how data is physically stored on disk and handles the
input/output operations.
6. Transaction Manager
The Backup and Recovery Manager ensures that data is regularly backed up and can be
restored in the event of failure or corruption.
Traditional databases and e-databases represent two different approaches to data management
and storage. While traditional databases have been the backbone of enterprise systems for
decades, e-databases are newer, often designed for modern, internet-based applications and
distributed systems.
Traditional databases are typically database management systems (DBMS) that are installed,
hosted, and maintained on local servers or in data centers. These databases are managed on-
premises by the organization and provide centralized data storage and management.
Control and Customization: Full control over the hardware and software configuration.
Reliability: Known, stable, and mature technology. Offers predictable performance in
controlled environments.
Security: Greater control over access, data protection, and privacy due to the local
deployment.
Performance: Can be optimized for performance based on the specific hardware and
environment.
Cost: High upfront costs for purchasing hardware, software licenses, and maintaining
infrastructure.
Limited Scalability: Scaling may require significant investment in hardware and
infrastructure, and horizontal scaling (adding more servers) can be complex.
Maintenance Overhead: Requires dedicated IT staff for database maintenance, backup,
and recovery.
Geographical Constraints: Data is often restricted to a specific location and can be
challenging to access globally without a strong network infrastructure.
E-databases, also known as Cloud Databases, refer to databases that are hosted on remote
servers provided by cloud service providers (e.g., Amazon Web Services, Microsoft Azure,
Google Cloud). These databases can be accessed over the internet and are typically managed,
maintained, and optimized by the cloud provider.
Characteristics of E-databases:
Deployment: Hosted on cloud infrastructure, eliminating the need for physical hardware
and on-premises management.
Access: Accessible from anywhere with an internet connection, supporting both local and
remote users.
Data Storage: Data can be stored in various formats, including structured (relational),
semi-structured (JSON), and unstructured (binary large objects - BLOBs). E-databases
may support multiple types (e.g., NoSQL databases like MongoDB, SQL databases like
Amazon RDS).
Scalability: Cloud databases are highly scalable, with elastic provisioning of resources.
They can scale horizontally by adding more resources (virtual machines, storage), or
vertically by increasing capacity.
Transaction Handling: Many cloud databases support ACID transactions, although
some NoSQL databases may relax consistency to achieve higher availability and partition
tolerance (CAP theorem).
Security: Cloud providers manage security and often provide features like encryption,
access control, and automated backups. However, users still need to configure and
manage permissions.
Backup and Recovery: Backup and recovery services are integrated into cloud database
offerings, providing automated and seamless data protection with redundancy across
multiple data centers.
Cost-Effective: Pay-as-you-go pricing model, which reduces capital expenses and lowers
operational costs.
Scalability: Elastic scaling capabilities make it easy to scale up or down based on
demand, without significant upfront investments.
Availability and Reliability: Cloud providers offer high uptime guarantees, redundancy,
and failover mechanisms, ensuring minimal downtime.
Global Access: E-databases are accessible from anywhere with internet access, making
them ideal for distributed and global teams.
Automatic Maintenance: Cloud providers typically handle software updates, security
patches, and performance tuning, reducing the workload for organizations.
Backup and Recovery: Cloud providers typically offer automated backup and disaster
recovery solutions, which makes it easier to restore data in case of failure.
Disadvantages of E-databases:
Data Privacy and Security Concerns: Storing sensitive data off-premises in a third-
party cloud provider raises concerns regarding data privacy, compliance, and control over
security.
Dependence on Internet: Access to the database requires a reliable internet connection.
Any internet outage or disruption can affect database accessibility.
Ongoing Costs: While the pay-as-you-go model can be cost-effective, the costs can
accumulate over time, especially with high-demand workloads or large-scale data
storage.
Vendor Lock-in: Organizations may become dependent on the cloud provider's
infrastructure, making it difficult to migrate data or applications to another provider.
Differences Between Traditional Databases and E-databases
Startups and SMBs (Small and Medium Businesses): Cloud databases are cost-
effective, reducing the need for large initial investments in hardware and IT staff.
Global Applications: E-databases are ideal for distributed applications with users across
the globe, ensuring data accessibility and high availability.
Big Data: Cloud databases, especially NoSQL databases, are well-suited for handling
large volumes of unstructured or semi-structured data (e.g., social media data, IoT data).
Web and Mobile Applications: For applications that need to scale quickly based on
usage, such as e-commerce platforms, cloud databases offer flexibility.