Chapter 1
Chapter 1
(eg: 201 , Shounak , 21) these have no meaning now but if we organize them in the following way
• These are traditional method of organizing and storing data on a computer. They are a
essentially a collection of individual files each containing a specific set of data . ( eg : set of
text documents or spreadsheets or images or videos)
DISADVANTAGES OF FBS :
• Data redundancy: Duplicate data may be stored across files causing storage wastage and
increased risk of data inconsistencies
• Data isolation Data is often stored in separate files making it difficult to integrate and
analyze information across different sources
• Limited Data Sharing: Sharing data between different applications or users can be
complex and often requires manual intervention.
• Integrity problems : difficult to enforce and update consistency constraints
• Atomicity problems : ensuring complete or no execution of operations during failures is
challenging
• Security problems : hard to restrict unauthorized access due to lack of centralized control
• Concurrent Access Issues: Simultaneous updates can cause data inconsistency.
1
ADVANTAGES OF DBMS over FBS :
• Reduces data redundancy as DBMS stores data in single place avoiding duplication across
multiple files
• DBMS ensures Data Consistency , any update in the DB is reflected everywhere avoiding
mismatched data
• Easy to retrieve data with queries without having to write complex programs
• DBMS provides better security as no unauthorized access is allowed
• Supports data integrity (like no negative balance allowed) and atomicity ( ensures
complete operations or no execution of operations at all during system failures)
• DBMS provides automatic backup and recovery in case of failure keeping the data safe
• DBMS allows concurrent handling of data without errors
• DBMS easily handles large and growing amount of data providing scalability
It is a method of managing data where all information is stored in a centralized database system
and multiple users or applications can access and manipulate this data efficiently
1. Centralized Storage: All data is stored in a single, organized database rather than scattered
across files.
2. Data Sharing: Allows multiple users and applications to access the same data concurrently.
3. Data Independence: Changes to the database structure don’t affect applications that use the
data.
4. Minimized Redundancy: Reduces duplicate data to save storage and maintain consistency.
7. Efficient Retrieval: Queries can quickly retrieve specific data without complex programming.
1 tier : in this the user directly interacts with DBMS and any changes are made directly to the
database. Its mainly used for development of local applications allowing programmers to
communicate with database for quick responses .
• Client : The user interface where users interact with the application
• Server : The database layer where data is stored and manages
The client sends request to the server and the server processes these requests and sends back the
required data this architecture is commonly used in applications where the user interacts with the
database via an application rather than directly
Advantages are : better and simplified user interaction, faster communication b/w server and
client
(eg)A desktop application like MS access where user interacts with db through a software
interface
2
3 tier : in here there is no direct communication between server and client
A 3-tier architecture divides the system into three layers to enhance modularity and
scalability:
1. Presentation Layer (Client): The user interface (UI) where users interact with the system,
such as web browsers or desktop applications.
2. Application Layer (Middle Tier): Processes the business logic and acts as a bridge
between the client and the database.
How it works :
Disadvantages
3
• Complexity: More layers make the system harder to develop and manage.
• Performance Overhead: Communication between layers can slow down the system.
• Cost: Requires more resources and infrastructure, increasing costs.
• Network Dependency: Issues in the network can disrupt communication between layers.
• Development Effort: More time and effort needed to design and implement the system.
example
A typical e-commerce website:
• Client: User's browser.
• Application Layer: The server that handles the website logic (e.g., Java Spring, Django).
• Database Layer: Stores product details, user data, and orders.
NEED FOR 3 level architecture
• Separation of Tasks: Divides the system into different layers (UI, business logic, data),
making it easier to manage and update each part.
• Data Independence: Changes in one layer (like the database) don't affect other layers,
making the system more flexible.
• Scalability: Each layer can be scaled separately to handle more users or data, improving
performance.
• Security: The database is protected from direct user access, keeping sensitive data safe.
• Easier Maintenance: Each layer can be updated independently, making it easier to
maintain the system.
PHYSICAL DBMS ARCHITECTURE
Physical DBMS Architecture refers to how data is stored and managed on physical devices like
hard drives. It focuses on how the database organizes and retrieves data efficiently.
1. Storage Manager: Manages how data is saved and accessed from the storage.
o Buffer Manager: Manages memory used to store data temporarily for faster access.
2. Data Access Methods: These are ways the system finds data efficiently, such as using
sorting or indexing.
3. Disk Storage: Refers to how data is saved on storage devices (e.g., hard drives or SSDs)
and organized for easy access.
4. Transaction Management: Makes sure that database changes are completed correctly, or
if there’s a problem, everything is rolled back to the correct state.
• Efficient Storage: Organizes data for better storage and faster access.
1. (Database design and planning)The DBA designs the structure of the database to ensure
it is well-organized, efficient, and scalable.
2. data security : control access to the database and enforces security measures
3. Backup and Recovery : plans and implements data backups and recovery techniques
4. Performance monitoring : Monitors and optimizes the database for better performance
5. Database maintenance : Regularly updates and maintains the database
6. Data Integrity: Ensures data accuracy and consistency through constraints.
7. Troubleshooting: Resolves issues like slow queries or database crashes.
1. Data Files:
• Definition: Data files store the actual data in a database. Each data file consists of records
organized in a structured format, such as rows in a table.
• Role: They store all the information needed for operations like retrieval, insertion, and
updates.
• Types:
o Primary Data Files: Store the actual data.
o Secondary Data Files: Used for specific functions like indexing or for organizing
data across multiple locations.
2. Indices:
• Definition: An index is a data structure that improves the speed of data retrieval
operations on a database table.
• Purpose: It provides quick access to rows based on values in specific columns.
• Types:
o Primary Index: Built on the primary key of the table.
o Secondary Index: Built on non-primary key columns.
• Benefits:
o Speeds up query performance.
o Reduces data retrieval time by providing fast lookup of records.
3. Data Dictionary:
A Data Dictionary is like a reference guide that stores information about the database, such as
how the tables are connected and what data they hold.
It helps keep the data organized and prevents duplication.
Eg: For example, a data dictionary could describe a table that holds employee details, explaining
what each column in the table means (like name, address, etc.). It is an important part of a
database because it helps manage data in a clear and structured way.
5
• Role: It helps DBAs and developers understand the structure and relationships of data
within the database.
• Contents:
o Table names, column names, data types, and sizes.
o Relationships (foreign keys, etc.).
o Constraints (primary keys, unique keys, etc.).
• Benefits:
1. Better Understanding: It provides important information about the database, such as
entities, relationships, and attributes, which the data model alone doesn’t give.
2. Reduces Redundancy: It helps avoid repeating data and ensures consistency when
different team members use the data.
3. Structured Design: It supports designing and analyzing data by following data standards,
which are rules for how data should be collected and presented.
4. Naming Conventions: It helps define the rules for naming things in the database, ensuring
consistency.
DATA ABSTRACTION
Data abstraction is the process of hiding the details of how data is stored and maintained while
showing only the relevant information to the user. It allows users to interact with the data without
needing to understand its internal complexities
• Definition: This is the lowest level of abstraction that describes how data is actually stored
on disk.
• Details: It involves technical specifics, such as data structures, indexing, and file
organization.
• Definition: This level describes what data is stored in the database and the relationships
between those data. DBAs operate in this level
ID : char(10);
name : char(30);
dept_name : char(20);
total_credits : numeric(3);
6
end;
• Definition: The highest level of abstraction shows only specific parts of the database to
users, tailored to their needs.
• Details:
o It hides details of the logical level and enforces data security by restricting access
to sensitive parts of the database.
• Example: A university registrar’s office clerk sees only student records, not instructor
salaries.
Schema: The overall structure of the database (e.g., a blueprint showing how tables like student
and course are related).
Instance: The actual data stored in the database at a specific time (e.g., a table filled with rows
of student data).
3. Flexibility: Allows the logical structure to change without affecting user interaction (data
independence).
Types of Databases
Databases are classified based on their data organization, usage, and access methods.
1. Relational Database
• Definition: Data is organized in tables (relations) with rows (records) and columns
(attributes).
2. Hierarchical Database
• Definition: Data is organized in a tree-like structure where each child has a single parent.
3. Network Database
7
4. Object-Oriented Database
6. Key-Value Database
7. Columnar Database
Relational Model
• The relational model organizes data into tables, also called relations.
• Most modern database systems, like MySQL and PostgreSQL, use the relational model
because it is simple and efficient.
• Definition: A conceptual design model that describes data using entities, attributes, and
relationships. It is used for database design
• Components:
8
3. Relationships: Associations between entities (e.g., "Enrolls" relationship between
Student and Course).
• ER Diagram Symbols:
o Rectangle: Entity.
o Ellipse: Attribute.
o Diamond: Relationship.
• Example: A university database might have entities like Student and Course, with a
relationship Enrolls linking them.
A database model is a theoretical blueprint or framework that defines how data is organized and
structured within a database system. It provides a conceptual representation of the data and its
relationships.
The relational model is a specific type of database model that organizes data into tables, with
rows representing individual records and columns representing attributes. It's characterized by
the use of primary keys and foreign keys to establish relationships between tables.
DATABASE TERMS
Domains
• It specifies the data type and any constraints on the values, such as ranges, allowed
characters, or valid values.
Example:
• Age: Domain: Integer, Range: 0-120
• Gender: Domain: {'Male', 'Female', 'Other'}
9
• Relation: A table in a relational database. It consists of a set of tuples, each having the
same number of attributes.
3. keys
• Super key: A set of one or more attributes that uniquely identifies each tuple in a relation.
• Candidate Key: The minimal set of attributes that can uniquely identify a tuple is known as
a candidate key.
o {EmployeeID} and {Email} are candidate keys (both can uniquely identify each
row).
o In the "Employees" table, {EmployeeID} could be chosen as the primary key since it
uniquely identifies every employee
• Foreign Key: An attribute or set of attributes in one relation that refers to the primary key
of another relation. It establishes a link between tables.
Example:
In a "Departments" table:
DeptID DeptName
D1 HR
D2 IT
• In an "Employees" table:
101 Alice D1
102 Bob D2
10
o If "EmployeeID" is the primary key in an "Employees" table, "Email" could be an
alternate key if it is guaranteed to be unique for each employee.
• Unique Key : it is similar to a candidate key it ensures uniqueness within a colum However
it allows one null value while candidate keys cant have null values
RELATIONAL CONSTRAINTS
Relational constraints are rules that must be enforced on the data within a relational database to
maintain data integrity. They ensure that the data is accurate, consistent, and meets the specific
requirements of the application.
Domain Constraints
• Domain constraints restrict the values that can be stored in an attribute to a specific set of
values ensuring data quality by preventing invalid values from being entered
Key Constraints
• Key constraints ensure the uniqueness and referential integrity of data within and between
tables.
• Uniqueness: The primary key must be unique. No two rows can have the same primary
key.
• Referential Integrity: A foreign key in one table must either match a primary key in
another table or be left empty (null).
o Example: If a "Course" table has a foreign key that references the "Student" table,
every Student_ID in the "Course" table should match an existing Student_ID in the
"Student" table, or it can be empty (null) if no student is assigned.
• Update Operations: Actions that modify data in the database, such as:
11
o Delete: Removing existing tuples from a relation.
o Reject the operation: Prevent the update from occurring if it would violate a
constraint.
1. Integrity Constraints
• Definition: Integrity constraints are rules that must be enforced on the data within a
relational database to maintain data accuracy, consistency, and adherence to the
application's specific requirements. They act as safeguards to prevent invalid or
inconsistent data from being entered or modified.
• Types:
▪ Unique Key: Ensures uniqueness within a column, allowing one null value.
▪ Foreign Key: Links tables by referencing the primary key of another table.
o Entity Integrity: Ensures that every table has a primary key and that no primary
key value is null.
2. Update Operations
12
• Reject the Operation: The database system prevents the update from occurring. This is the
most common and strict approach.
• Trigger Actions: Execute specific actions (e.g., send notifications, log the violation) when a
constraint is violated.
4. Relational Operations
Relational operations manipulate relations (tables) to retrieve and modify data. Common
operations include:
• Join: Combines tuples from two or more relations based on a matching condition.
• Union: Combines two relations into a single relation containing all unique tuples.
• Intersection: Creates a relation containing only tuples that exist in both input relations.
• Difference: Creates a relation containing tuples that exist in the first relation but not in the
second.
Key Points:
• Integrity constraints are essential for maintaining data accuracy and consistency.
• Update operations can potentially violate constraints, requiring careful handling to ensure
data integrity.
• Relational operations provide the foundation for querying and manipulating data in
relational databases.
ER MODELLING
1) What is an Entity?
• Definition: An entity is any object or thing in the real world that can be distinctly identified.
It can be a person, place, object, or event.
o In a school database, Student and Teacher are entities. Each student or teacher
can be uniquely identified by their ID.
2) What is an Attribute?
• Example: For a Student entity, attributes could be Name, Age, Grade, Address.
3) What is a Relationship?
• A relationship describes how two or more entities are related to each other.
• Example: A Student is enrolled in a Course. This is a relationship between the Student and
Course entities.
13
More about Entities and Relationships
4. Types of Entities
• Strong Entity: An entity that can exist independently, without relying on other entities.
o Example: A Student entity can exist on its own, having its own unique student ID.
• Weak Entity: An entity that depends on a strong entity to exist. It cannot be uniquely
identified by its own attributes.
o Example: A Course Enrollment entity might depend on both the Student and
Course entities for its identity.
5. Types of Relationships
• One-to-One (1:1): One entity in a table is related to one entity in another table.
• One-to-Many (1:M): One entity in a table can be related to multiple entities in another
table.
o Example: A Teacher can teach many Courses, but each Course is taught by only
one Teacher.
• Many-to-Many (M:M): Multiple entities in one table are related to multiple entities in
another table.
o Example: Students can enroll in many Courses, and each Course can have many
Students.
• Step 1: Convert entities to tables: Each entity becomes a table in the database.
• Step 2: Convert attributes to columns: Each attribute of the entity becomes a column in
the table.
o Example: The Student table will have columns like StudentID, Name, and Age.
• Step 3: Convert relationships to foreign keys: For relationships, we add foreign keys in
the related tables to show how they are connected.
The Entity-Relationship (ER) Model is used to visually represent and design databases. It helps
in organizing data, identifying how different pieces of information are related, and creating a
structure for the database. This makes it easier to understand, manage, and work with complex
data systems.
14
1. Database Design: ER modeling helps in designing databases by clearly defining the
entities, their attributes, and the relationships between them.
2. Simplifies Complex Data: It breaks down complex data into smaller, manageable parts,
making it easier to understand the structure.
4. Data Integrity: It helps to define rules for data consistency, ensuring the database
functions correctly.
BENEFITS :
Clear Structure: Provides a visual map of data, showing entities and their relationships, making it
easier to understand and design the database.
Improves Data Integrity: Ensures rules like uniqueness and consistency are enforced, keeping
the data accurate and reliable.
Reduces Redundancy: Helps avoid duplicate data by organizing it into related entities and
defining relationships.
Simplifies Database Design: Clearly defines what data to store, how to structure it, and how
tables are related, leading to better database organization.
Easier Modifications: Allows easy addition of new features or changes without disrupting the
existing structure.
Faster Development: Speeds up database creation by providing a clear guide for developers.
Improved Query Design: Makes it easier to design efficient queries by showing how data is
related across tables.
15