introduction to DB
introduction to DB
LEVEL 1
1. INTRODUCTION
In the modern digital landscape, data is similar to the lifeblood that flows through the veins of
virtually every organization, big or small. This data, ranging from customer information to
operational metrics, needs a structured, secure, and efficient storage system. This is where
databases come into play. A database is not just a collection of data; it's a dynamic and
foundational component of modern computing, crucial for the storage, retrieval, and
management of data.
1
. One-to-one relationship; a relationship between two tables in which for each record in
the first table, there is only one corresponding record in the related table.
Primary Key; a field in a table whose value uniquely identifies each record in the table.
Query; a request for a particular collection of data in a database.
Data validation
Data validation is the process of ensuring that a program operates on clean, correct
and useful data. In other words, data validation is the practice of checking the integrity,
accuracy and structure of data before it is used for a business operation.
Metadata
Metadata is literally "data about data." This term refers to information about data
itself. Perhaps the origin, size, formatting or other characteristics of a data item.
Data Dictionary
A data dictionary is a collection of metadata that provides information about data stored
in a computer system. When developing programs that use the data model, a data
dictionary can be consulted to understand where a data item fits in the structure, what
values it may contain, and basically what the data item means in real-world terms.
In the structure of a database, the smallest component under which data is entered is the field.
All fields in the same database have unique names. Fields appear as columns in a table and
as cells in a form.
Several data fields make up a record, several records make up a file, and several files
make up a database.
2
variety of methods, such as papers, index cards, or a software program. A dictionary, a
phone book, a collection of recipes and a TV guide are all common examples of non-
computerized databases.
A computerized database is a technological tool that stores organized and structured data in
an electronic format. Computerized databases are widely used for data management in
various fields such as healthcare, finance, and education among others.
Improved data security: Computerized databases offer improved data security by storing
data in a centralized location that is accessible only to authorized users. Data can also be
encrypted, ensuring that it is not accessible to unauthorized users. The used of computerized
databases minimizes the risk of data loss or unauthorized use of data.
Initial cost and training: setting up and implementing a computerized database can be
costly, particularly for small businesses or organizations. Additionally, staff members
require training on how to operate the system effectively, leading to additional cost.
3
1.3. Differences between computerized and non-computerized database
On the other hand, a software program that manages databases is called a database management
system. A database is a set of linked data that may be efficiently utilized for data retrieval,
insertion, and deletion. Among other things, it organizes data into tables, schemas, views, and
reports. For instance, commercial databases like MySQL, Oracle, and others are frequently used
in various applications. It offers an interface for operations including, but not limited to, creating
a database, saving files, updating the database, and establishing a table in the database. It
4
guarantees the security and safety of the database. Furthermore, it guarantees consistency of data
when there are numerous users.
Databases come in different forms to accommodate diverse data needs. Below are some type of
databases.
Relational database
This is the most common type of database, organized around tables that contain rows and
columns. Each row represents a record, while each column represents a field of data. Examples
of relational databases include PostgreSQL, and Microsoft SQL Server.
NoSQL Databases
These databases cater to non-tabular and unstructured data types. They are highly scalable and
can handle vast amounts of data in different formats. Examples include MongoDB (document-
based), Cassandra (column-based), and Redis (key-value store)
5
These databases store objects, which can be anything from text and numbers to images and
audio. They are suited for complex data structures and are used in applications involving
multimedia or engineering. Examples include db4o, ObjectStore, Versant, Zope Object Database
(ZODB), GemStone/S, ObjectDB.
Assignment
Data organization: Databases offer systematic structures to store and arrange data effectively.
This arrangement not only upholds data integrity but also eradicates duplications
Data Retrieval: Databases excel at swiftly and effectively retrieving information. They can execute
intricate queries on extensive datasets in a matter of seconds, providing remarkable efficiency in data
retrieval. This capability proves indispensable for managing and extracting insights from large volumes
of information.
Data Integrity: Databases ensure data accuracy and consistency through thoughtful design and
well defined constraints. Validation rules play a crucial role by preventing the addition of
erroneous or incompatible data and upholding the integrity of the stored information.
Scalability: Databases can be made more accommodating to increasing data volumes and user
demands through scalability. This can involve upgrading hardware to scale vertically or adding more
servers to scale horizontally. Scalability ensures efficient management of resources to meet the evolving
requirements of the system.
CHARACTERISTICS OF DBMS
Real-world entity: A modern DBMS is more realistic and uses real-world entities to design its
architecture. It uses the behavior and attributes too. For example, a school database may use
students as an entity and their age as an attribute.
Relation-based tables: DBMS allows entities and relations among them to form tables. A user
can understand the architecture of a database just by looking at the table names.
Isolation of data and application: A database system is entirely different than its data. A
database is an active entity, whereas data is said to be passive, on which the database works and
organizes. DBMS also stores metadata, which is data about data, to ease its own process.
6
Less redundancy: DBMS follows the rules of normalization, which splits a relation when any
of its attributes is having redundancy in values. Normalization is a mathematically rich and
scientific process that reduces data redundancy.
Consistency: Consistency is a state where every relation in a database remains consistent. There
exist methods and techniques, which can detect attempt of leaving database in inconsistent state.
A DBMS can provide greater consistency as compared to earlier forms of data storing
applications like file-processing systems.
Query Language: DBMS is equipped with query language, which makes it more efficient to
retrieve and manipulate data. A user can apply as many and as different filtering options as
required to retrieve a set of data. Traditionally it was not possible where file-processing system
was used.
Multiuser and Concurrent Access: DBMS supports multi-user environment and allows them to
access and manipulate data in parallel. Though there are restrictions on 2 transactions when users
attempt to handle the same data item, but users are always unaware of them.
Multiple views: DBMS offers multiple views for different users. A user who is in the Sales
department will have a different view of database than a person working in the Production
department. This feature enables the users to have a concentrate view of the database according
to their requirements.
Security: Features like multiple views offer security to some extent where users are unable to
access data of other users and departments. DBMS offers methods to impose constraints while
entering data into the database and retrieving the same at a later stage. DBMS offers many
different levels of security features, which enables multiple users to have different views with
different features. For example, a user in the Sales department cannot see the data that belongs to
the Purchase department. Additionally, it can also be managed how much data of the Sales
department should be displayed to the user.
ACID Properties: DBMS follows the concepts of Atomicity, Consistency, Isolation, and
Durability (normally shortened as ACID). These concepts are applied on transactions, which
manipulate data in a database. ACID properties help the database stay healthy in multi-
transactional environments and in case of failure.
Explanation
For example, transferring money from one bank account involves multiple steps:
C – Consistency
The consistency property ensures that only valid data is written to the database. Before
committing a transaction, consistency checks are performed to maintain database constraints and
business rules.
For example, a transaction crediting 5000 to a bank account with a current balance of 3000 is
invalid if the account has an overdraft limit of 1000. The transaction violates consistency by
exceeding the permissible account limit. Hence, it is blocked and aborted.
I-Isolation
For instance, if Transaction T1 updates a row, Transaction T2 must wait until T1 commits or
rolls back. Isolation prevents T2 from reading unreliable data updated by T1 but not committed
yet.
D-Durability
Durability provides persistence guarantees for committed transactions. The system upholds the
changes once a transaction is committed even if it crashes later. Durability is achieved with
database backups, transaction logs, and disk storage.
For example, if a transaction updates a customer's address, durability ensures the updated address
is not lost due to a hard disk failure or power outage. The change will persist with the help of
storage devices, backups, and logs.
Durability ensures that transactions, once committed, will survive permanently. Failed hardware,
power loss, and even database crashes will not undo committed transactions due to durability
support.
8
A typical DBMS has users with different rights and permissions who use it for different
purposes. Some users retrieve data and some back it up. The users of a DBMS can be
broadly categorized as follows
Administrator:Administrators maintain the DBMS and are responsible for
administrating the database. They are responsible to look after its usage and by whom it
should be used. They create access profiles for users and apply limitations to maintain
isolation and force security. Administrators also look after DBMS resources like system
license, required tools, and other software and hardware related maintenance.
Designers: Designers are the group of people who actually work on the designing part of
the database. They keep a close watch on what data should be kept and in what format.
They identify and design the whole set of entities, relations, constraints, and views.
End Users: End users are those who actually reap the benefits of having a DBMS. End
users can range from simple viewers who pay attention to the logs or market rates to
sophisticated users such as business analysts.
An Entity relationship model (ER model) describes the structure of a database with the help
is a design or blueprint of a database that can later be implemented as a database. The main
An Entity Relationship (ER) Diagram is a type of flowchart that illustrates how “entities” such as
people, objects or concepts relate to each other within a system. ER Diagrams are most often
used to design or debug relational databases in the fields of software engineering, business
information systems, education and research. Also known as ERDs or ER Models, they use a
defined set of symbols such as rectangles, diamonds, ovals and connecting lines to depict the
interconnectedness of entities, relationships and their attributes. They mirror grammatical
structure, with entities as
nouns and relationships as verbs.
ER Diagrams are composed of entities, relationships and attributes. They also depict cardinality,
Which determines relationships in terms of numbers. The main components are as follows
Entity
A definable thing such as a person, object, concept or event that can have data stored about it.
Think of entities as nouns. In E-R Diagram, an entity is represented using rectangles. Name of
the Entity is written inside the rectangle.
Examples: STUDENT, EMPLOYEE, ACCOUNT etc.
9
A Weak entity is an entity that depends on another entity. Weak entity doesn’t have key
attribute of their own. Double rectangle represents weak entity.
Key Attribute:
Key attribute represents the main characteristics of an Entity. It is used to represent Primary
key. Ellipse with underlying lines represents Key Attribute.
Composite Attribute:
An attribute can also have their own attributes. These attributes are known as Composite
attribute.Composite attributes can be divided into subparts. For example, an attribute name
could be structured as a composite attribute consisting of first-name, middle-name and last name.
Multivalued Attributes:
An attribute that can hold multiple values is known as multivalued attribute. We represent it
with double ellipses in an E-R Diagram. E.g. A person can have more than one phone numbers
so the phone number attribute is multivalued.
10
Derived Attribute: A derived attribute is one whose value is dynamic and derived from
another attribute. It is represented by dashed ellipses in an E-R Diagram. E.g. Person age is a
derived attribute as it changes over time and can be derived from another attribute (Date of
birth).
Relationship
A relationship is an association (connection) among (between) two or more entities.
Relationships are typically shown as diamonds or labels directly on the connecting lines.
Types of relationship
1. One-to-One Relationship: When only one instance of an entity is associated with the
relationship, then it is known as one to one relationship. For example, a person has only one
passport and a passport is given to one person.
Also, many students can study in one college and one college can have
many students.
M N
Student Assigned Project
12
Note: E-R diagrams can be drawn using chen notation or crow’s foot notation
Ordinality, on the other hand, is the minimum number of times an instance in one entity can be
associated with an instance in the related entity.
13
Steps to Create an ERD (E-R Digram)
Example 1
In a un
iversity, a Student enrolls in Courses. A student must be assigned to at least
one or more Courses. Each course is taught by a single Professor. To
maintain instruction quality, a Professor can deliver only one course.
Establish the E-R diagram
15
Step 5. Create the ERD
A more modern representation of ERD Diagram.
16
identify rows uniquely. They establish connections between different tables and assist in
uniquely identifying a row by utilizing one or more columns in the table. They help enforce
integrity and help identify the relationship between tables.
i. Super Key
A Super key is any combination of fields within a table that uniquely identifies each
record within that table. Consider the employee table below.
Emp_ID Emp_name Emp_spcode Emp_sal Emp_contact Emp_email
A field or a set of fields that uniquely identify each record in a table is known as a
primary key. It is any key from the candidate key. This implies that no two records in the
relation can have same value for the primary key. For example, an employee number
uniquely identifies a member of staff within a company.
An IP address uniquely addresses a PC on the internet. A primary key is mandatory. That is,
each entity occurrence must have a value for its primary key.
d) Foreign Key:
A field of a table that references the primary key of another table is referred to as foreign
key. The figure below illustrates how a foreign key constraint is related to a primary key
constraint. Here, the field Item_Code in the PURCHASE table references the field
Item_Code in the ITEM relation. Thus, the attribute Item_Code in the PURCHASE relation
is the foreign key.
17
e) Compound/composite Key
A compound key consists of more than one attribute to uniquely identify an entity
occurrence. Each attribute, which makes up the key, is also a simple key in its own right. For
example, we have an entity named enrolment, which holds the courses on which a student is
enrolled. In this scenario a student is allowed to enroll on more than one course. This has a
compound key of both student number and course number, which is required to uniquely identify
a student on a particular course.
f) Alternate key
All candidate keys except primary key.
18