Basic Database Concepts
What is a Database?
A database is a structured collection of data that allows for easy storage, retrieval, and
management.
Key Components of a Database
• Tables: Organize data into rows and columns.
• Records: Each row in a table, representing a single item.
• Fields: Each column in a table, representing a property of the item.
Database Approach vs. File-Based Systems
Database Approach
• Centralized Management: Data is stored in a central location.
• Data Integrity: Ensures data is accurate and consistent.
• Multi-user Access: Multiple users can access and manipulate data simultaneously.
• Relationships: Allows linking of data across different tables.
File-Based Systems
• Separate Files: Data is stored in individual files.
• Limited Data Sharing: Difficult for multiple users to access data at the same time.
• Redundancy: Data may be duplicated across files, leading to inconsistency.
• No Relationships: Harder to link data between files.
Database Architecture
What is Database Architecture?
Database architecture defines how data is stored, organized, and accessed in a database system.
It helps in designing the structure for managing data efficiently.
Three-Level Schema Architecture
1. Internal Schema
• Definition: The physical storage structure of the database.
• Details: Describes how data is stored on hardware, including file formats and access methods.
• Purpose: Optimizes performance and storage efficiency.
2. Conceptual Schema
• Definition: The logical structure of the entire database.
• Details: Represents the relationships among data entities and constraints without concern for
how data is stored.
• Purpose: Provides a unified view of the data, allowing for easy understanding and
management.
3. External Schema
• Definition: The user view of the database.
• Details: Includes various user interfaces and customized views tailored to different user needs.
• Purpose: Simplifies data interaction for users, allowing them to access only relevant data.
Data Independence
• Definition: The ability to change the data structure without affecting how applications use that
data.
• Types:
o Logical Independence: Changes to the logical structure (like adding fields) don’t
affect data access.
o Physical Independence: Changes in data storage methods (like using different servers)
don’t impact the logical structure.
Relational Data Model
• Definition: A way to organize data into tables (relations) that can relate to each other.
• Structure: Data is stored in rows and columns, making it easy to understand and manipulate.
Attributes
• Definition: The columns in a table.
• Example: Attributes can be things like "Name," "Age," or "Address."
Schemas
• Definition: The blueprint of a database.
• Purpose: Defines the structure, including tables, attributes, and their relationships.
Tuples
• Definition: A single row in a table.
• Example: Represents one complete record, like all information for one student.
Domains
• Definition: The set of possible values for an attribute.
• Example: The domain for the "Age" attribute might be whole numbers from 0 to 120.
Relation Instances
• Definition: A specific set of data in a relation (table) at a given time.
• Example: If you have a table of students, the relation instance would be the actual records of
students currently stored.
Keys of Relations
• Definition: Attributes (or combinations of attributes) that uniquely identify each tuple (row) in
a relation.
• Types:
o Primary Key: The main key that uniquely identifies each record.
o Foreign Key: An attribute that creates a link between two tables.
Integrity Constraints
• Definition: Rules that ensure data accuracy and consistency in the database.
• Types:
o Domain Constraints: Ensure that values in a column fall within a specified range.
o Entity Integrity: Ensures that primary keys are unique and not null.
o Referential Integrity: Ensures that foreign keys match primary keys in related tables.
Relational Algebra
• Definition: A formal system for manipulating relations (tables) using operations.
• Operations: Include selection, projection, union, intersection, and join.
Selection
• Definition: An operation in relational algebra that retrieves rows from a relation that meet
specific criteria.
• Example: Selecting all students with an age greater than 18.
Normalization
• Definition: The process of organizing a database to reduce redundancy and improve data
integrity.
• Purpose: Helps eliminate data anomalies and ensures that the database structure is efficient.
Functional Dependencies
• Definition: A relationship between two attributes, typically between a key and a non-key
attribute, where one attribute's value determines another's.
• Example: If "StudentID" determines "StudentName," then knowing the StudentID allows you
to find the corresponding StudentName.
Normal Forms
• Definition: Guidelines for structuring a relational database to reduce redundancy.
• Types:
o First Normal Form (1NF): Ensures that all entries in a table are atomic (indivisible).
o Second Normal Form (2NF): Requires that all non-key attributes depend on the whole
primary key.
o Third Normal Form (3NF): Ensures that non-key attributes are not dependent on
other non-key attributes.
Entity-Relationship Model
• Definition: A conceptual framework used to model the data and relationships within a
database. It helps visualize the structure and organization of data.
Entity Sets
• Definition: A collection of similar types of entities. Each entity is an object or thing in the real
world that can be distinctly identified.
• Example: An "Employee" entity set might include all employees in a company.
Attributes
• Definition: Properties or characteristics of an entity. Each attribute provides specific
information about the entity.
• Example: An "Employee" entity might have attributes like "EmployeeID," "Name," and
"DateOfBirth."
Relationships
• Definition: Associations between two or more entity sets. They describe how entities interact
with each other.
• Example: An "Employee" might have a relationship with a "Department" entity, indicating
which department the employee belongs to.
Entity-Relationship Diagrams (ER Diagrams)
• Definition: A visual representation of the entities, attributes, and relationships within a
database.
• Components:
o Entities: Represented by rectangles.
o Attributes: Represented by ovals connected to their entity.
o Relationships: Represented by diamonds connecting related entities.
• Purpose: ER diagrams help design and communicate the structure of a database clearly.
What is SQL?
• Definition: SQL is a standard programming language used to manage and manipulate relational
databases.
• Purpose: It allows users to create, read, update, and delete (CRUD) data in a database.
Basic SQL Commands
1. SELECT: Used to retrieve data from a database.
o Example: SELECT * FROM Students; (retrieves all records from the Students
table)
2. INSERT: Used to add new records to a table.
o Example: INSERT INTO Students (Name, Age) VALUES ('Alice', 20);
3. UPDATE: Used to modify existing records in a table.
o Example: UPDATE Students SET Age = 21 WHERE Name = 'Alice';
4. DELETE: Used to remove records from a table.
o Example: DELETE FROM Students WHERE Name = 'Alice';
Filtering Data
• WHERE Clause: Used to specify conditions for filtering results.
o Example: SELECT * FROM Students WHERE Age > 18;
Sorting Data
• ORDER BY: Used to sort the results in ascending or descending order.
o Example: SELECT * FROM Students ORDER BY Age DESC;
Grouping Data
• GROUP BY: Used to group rows that have the same values in specified columns.
o Example: SELECT COUNT(*) FROM Students GROUP BY Age;
Joining Tables
• JOIN: Combines rows from two or more tables based on a related column.
o Example:
sql
SELECT Students.Name, Courses.CourseName
FROM Students
JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;
Creating and Modifying Tables
• CREATE TABLE: Used to create a new table.
o Example:
sql
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT
);
• ALTER TABLE: Used to modify an existing table.
o Example: ALTER TABLE Students ADD COLUMN Email VARCHAR(100);
Joining Tables
Joins are used to combine rows from two or more tables based on a related column.
Types of Joins
1. INNER JOIN
o Definition: Returns only the rows that have matching values in both tables.
o Example:
sql
SELECT Students.Name, Courses.CourseName
FROM Students
INNER JOIN Enrollments ON Students.StudentID =
Enrollments.StudentID
INNER JOIN Courses ON Enrollments.CourseID = Courses.CourseID;
2. LEFT JOIN (or LEFT OUTER JOIN)
o Definition: Returns all rows from the left table and matched rows from the right table.
If no match, NULL values are shown.
o Example:
sql
SELECT Students.Name, Courses.CourseName
FROM Students
LEFT JOIN Enrollments ON Students.StudentID =
Enrollments.StudentID;
3. RIGHT JOIN (or RIGHT OUTER JOIN)
o Definition: Returns all rows from the right table and matched rows from the left table.
If no match, NULL values are shown.
o Example:
sql
SELECT Students.Name, Courses.CourseName
FROM Students
RIGHT JOIN Enrollments ON Students.StudentID =
Enrollments.StudentID;
4. FULL JOIN (or FULL OUTER JOIN)
o Definition: Returns all rows when there is a match in either left or right table records.
Non-matching rows will show NULLs.
o Example:
sql
SELECT Students.Name, Courses.CourseName
FROM Students
FULL OUTER JOIN Enrollments ON Students.StudentID =
Enrollments.StudentID;
Subqueries
Subqueries are queries nested inside another SQL query. They can be used to provide data for
the outer query.
Types of Subqueries
1. Single-Row Subquery
o Definition: Returns one row and can be used with comparison operators.
o Example:
sql
SELECT Name
FROM Students
WHERE Age = (SELECT MAX(Age) FROM Students);
2. Multiple-Row Subquery
o Definition: Returns multiple rows and used with operators like IN, ANY, or ALL.
o Example:
sql
SELECT Name
FROM Students
WHERE StudentID IN (SELECT StudentID FROM Enrollments WHERE
CourseID = 1);
3. Correlated Subquery
o Definition: Refers to columns in the outer query and is executed for each row of the
outer query.
o Example:
sql
SELECT Name
FROM Students s
WHERE EXISTS (SELECT * FROM Enrollments e WHERE e.StudentID =
s.StudentID);
Grouping
• Definition: Grouping is used to combine rows that have the same values in specified columns.
• Command: GROUP BY
• Example:
sql
SELECT Age, COUNT(*)
FROM Students
GROUP BY Age;
o This counts how many students there are for each age.
Aggregation
• Definition: Aggregation involves performing calculations on groups of data.
• Common Functions:
o COUNT(): Counts the number of rows.
o SUM(): Adds up values in a column.
o AVG(): Calculates the average of values.
o MIN(): Finds the smallest value.
o MAX(): Finds the largest value.
Example of Aggregation
• Using GROUP BY with aggregation:
sql
SELECT Age, AVG(Grade)
FROM Students
GROUP BY Age;
o This calculates the average grade for students of each age.
What is Concurrency Control?
• Definition: A method to manage simultaneous database transactions to ensure data integrity
and consistency.
Importance
• Prevents issues like lost updates and dirty reads when multiple users access the database at the
same time.
Key Concepts
1. Transactions: A sequence of operations treated as a single unit, adhering to ACID
properties.
2. Locking:
o Exclusive Lock: Only one transaction can write to the data.
o Shared Lock: Multiple transactions can read the data but not write.
3. Optimistic Concurrency Control: Allows transactions to proceed without locks,
checking for conflicts before committing.
4. Pessimistic Concurrency Control: Locks data during a transaction to prevent others
from accessing it.
5. Timestamping: Uses timestamps to order transactions and resolve conflicts.
Database Backup
• Definition: A backup is a copy of the database that can be used to restore data in case of loss or
corruption.
• Types:
1. Full Backup: A complete copy of the entire database.
2. Incremental Backup: Only the data that has changed since the last backup is saved.
3. Differential Backup: All changes made since the last full backup are saved.
Importance of Backups
• Protects against data loss from hardware failures, accidental deletions, or disasters.
• Ensures business continuity by allowing restoration of data.
Database Recovery
• Definition: The process of restoring the database from a backup after a failure.
• Methods:
1. Point-in-Time Recovery: Restores the database to a specific moment before an error
occurred.
2. Crash Recovery: Automatically restores the database to the last consistent state after a
failure.
Recovery Strategies
• Regular Backup Schedule: Frequent backups to minimize data loss.
• Testing Recovery Procedures: Regularly test recovery plans to ensure they work effectively
Indexes
• Definition: An index is a data structure that improves the speed of data retrieval
operations on a database table.
• Purpose: Makes searching for specific rows faster, similar to an index in a book.
• Types:
1. Single-column Index: Created on a single column for faster searches.
2. Composite Index: Created on multiple columns to improve query performance
involving those columns.
3. Unique Index: Ensures that all values in the indexed column(s) are unique.
• Benefits:
o Faster query performance.
o Efficient data retrieval.
• Drawbacks:
o Increased storage space.
o Slower write operations, as the index must be updated with each insert, update, or
delete.
NoSQL Systems
• Definition: NoSQL (Not Only SQL) systems are databases designed for unstructured or
semi-structured data, allowing for flexible data models.
• Types:
1. Document Stores: Store data as documents (e.g., JSON), like MongoDB.
2. Key-Value Stores: Store data as key-value pairs, like Redis.
3. Column-family Stores: Organize data into columns rather than rows, like Cassandra.
4. Graph Databases: Use graph structures for data representation, like Neo4j.
• Benefits:
o Scalability: Can handle large volumes of data and traffic.
o Flexibility: Allows for varied data types and structures.
o High performance: Optimized for read and write operations.
• Use Cases:
o Big data applications.
o Real-time web applications.
o Content management systems.
PAST PAPERS
Question 2
a) Binary Relationship: A relationship between two entities (tables) in a relational database.
b) Command Driven Interface: A user interface where users interact with the system by
entering commands.
c) Data Administration: The process of managing data in an organization, including planning,
organizing, and controlling data resources.
Question 3
A relational system is a database management system (DBMS) that stores data in tables related
to each other through common fields.
Relational Systems:
• Use tables to store data.
• Use relationships to connect tables.
• Follow specific rules (normal forms) to ensure data integrity.
Non-Relational Systems:
• Store data in key-value pairs, documents, or graphs.
• More flexible than relational systems.
• Suitable for large-scale, unstructured data.
Question 4
Major functions performed by a DBA:
• Database design and development.
• Database administration and maintenance.
• Security and access control.
• Performance tuning.
• Data backup and recovery.
Question 5
Anomalies in RDBMS:
• Insertion anomalies: Unable to insert data due to missing information.
• Deletion anomalies: Accidental deletion of data.
• Update anomalies: Inconsistent data updates due to redundancy.