Database Notes
Database Notes
A database is a collection of related data. By data, we mean known facts that can be recorded and
that have implicit meaning. For example, consider the names, telephone numbers, and addresses
of the people you know. Databases are formed of tables which are used to store multiple
entities. Each entity usually has its own row in a table and fields of that row hold the entity’s
attributes
Relational Database
A relational database is a database which recognises the differences between entities by creating
different tables for each entity. Employee table
EmployeeID Name DepartmentID
1 John Doe 101
2 Jane Smith 102
3 Alice Johnson 101
Flat File
A flat file is a database that consists of a single file. The flat file will most likely be based
around a single entity and its attributes. A simple text file storing data. For example employee:
EmployeeID, Name, Department, Salary
1, John Doe, Sales, 5000
2, Jane Smith, Marketing, 6000
3, Alice Johnson, Sales, 5500
Car
CarID Age Price
Car1 5 years $1,500
Car2 2 years $2,400
Entity identifiers
An entity identifier is an attribute given to each entity which is unique within that table.
Entity description
An example of an entity description is shown below:
Customer (CustomerID, CustomerName, CustomerAddress, CustomerEmail)
The name of the table is shown outside of brackets which contain each of the entity’s attributes
separated by commas.
If it is not possible to form a primary key from just one attribute, it is possible to combine
attributes to form what is called a composite primary key.
The primary key in Pilots is PilotNo and is FlightNo in Flights. The tables are linked by the
shared attribute PilotNo. This makes PilotNo a foreign key in Flights
Secondary Key
A secondary key allows a database to be searched quickly. For example, a patient is unlikely to
remember their patientID but will know their surname. Therefore, a secondary index (secondary
key) is set up on the surname attribute. This makes it possible to order and search by surname
which makes it easier to find specific patients in the database.
Relational Databases
Tables can be related to each other. There are three possible degrees of relationship between
tables in a database: one-to-one, many-to-many and one-to-many.
Benefits of Normalization
Minimizes data redundancy (duplicate data).
Minimizes null values.
Results in a more compact database.
Minimizes/avoids data modification issues.
Simplifies queries.
The database structure is cleaner and easier to understand.
You can extend the database without necessarily impacting the existing data.
Searching, sorting, and creating indexes can be faster, since tables are narrower, and more
rows fit on a data page.
An unnormalized database has the following shortcomings:
1. Update Anomaly
This occurs when data redundancy leads to inconsistency after updates.
When the same piece of information is stored in multiple rows of a table, updating it in one place
but not the others causes inconsistency.
Employee Table:
Employee_ID Name Department Department_Location
1 John Smith IT New York
2 Jane Doe IT New York
3 Mike Ross HR Los Angeles
If the location of the IT department changes to "San Francisco," we must update all rows where
Department = IT.
If we forget to update one row (e.g., for Jane Doe), the table will have inconsistent data, showing
both "New York" and "San Francisco" for the IT department.
2. Insertion Anomaly
This occurs when we cannot insert data into a table because certain fields are dependent on
others, leading to incomplete or irrelevant data.
This happens when a table requires the presence of data that may not yet exist.
Student_Course Table:
Student_I Student_Name Course_ID Course_Name
D
101 Alice Brown C001 Math
102 Bob Green C002 Physics
If we want to insert a new course "Chemistry" without enrolling any students, the table design
requires both Student_ID and Student_Name. This means we cannot add the course unless we
associate it with a student, leading to an insertion anomaly.
3. Deletion Anomaly
This occurs when deleting data unintentionally results in the loss of important information.
When a table stores multiple pieces of information in a single structure, deleting a row may
remove necessary data that has no independent representation.
Project_Employee Table:
Project_ID Project_Name Employee_ID Employee_Name
P001 Alpha 1 John Smith
P002 Beta 2 Jane Doe
If the last employee working on a project (e.g., John Smith) leaves the project, deleting their
record will also delete the project details (P001, Alpha). This is problematic if we still need the
project information even after all employees are removed.