0% found this document useful (0 votes)
2 views

introduction to DB

The document provides an introduction to databases, emphasizing their importance in data storage, retrieval, and management in modern organizations. It covers key concepts such as database types, terminology, advantages and disadvantages of computerized databases, and the characteristics of database management systems (DBMS). Additionally, it highlights the significance of data integrity, security, and the ACID properties that ensure reliable transaction processing.

Uploaded by

terasapcommunity
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

introduction to DB

The document provides an introduction to databases, emphasizing their importance in data storage, retrieval, and management in modern organizations. It covers key concepts such as database types, terminology, advantages and disadvantages of computerized databases, and the characteristics of database management systems (DBMS). Additionally, it highlights the significance of data integrity, security, and the ACID properties that ensure reliable transaction processing.

Uploaded by

terasapcommunity
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

COURSE TITLE: INTRODUCTION TO DATABASES

LEVEL 1

1. INTRODUCTION

In the modern digital landscape, data is similar to the lifeblood that flows through the veins of
virtually every organization, big or small. This data, ranging from customer information to
operational metrics, needs a structured, secure, and efficient storage system. This is where
databases come into play. A database is not just a collection of data; it's a dynamic and
foundational component of modern computing, crucial for the storage, retrieval, and
management of data.

1.1. Database Terminology and Concepts


 Database; a collection of information related to a particular topic or purpose. There are
two types of databases: Non-relational and relational. OR
A database is an organized collection of structured data, information typically stored
electronically in a computer system.
 Database management system (DBMS); a program such as Access, that stores,
retrieves, arranges, and formats information contained in a database. More examples of
DBMS include MySQL, Oracle Database, MongoDB etc.
 Database model refers to the structure of the information stored in the database. This
model should include how each individual piece of information relates to all the other
information in the database. A well-designed database should eliminate the need to enter
the same data repeatedly and prevent duplication of information, thereby maintaining the
integrity of the data.
 Database modeling – the process of strategically planning where to store each piece of
information you wish to include in your database.
 Datasheet; a format of columns and rows displaying information.
 Field data types; a characteristic of a field that determines what kind of data it can store.
For example, a field whose data type is Text can store data consisting of either text or
number characters, but a Number field can store only numerical data.
 Form; a structured document with specific areas for viewing or entering data one record
at a time. Forms can be constructed in columnar, tabular, datasheet, or a simple justified
format.
 Criteria; the conditions that control which records to display in a query.
 Non-relational database; also called a flat file, stores information in one table. Non-
relational databases are useful for information stored in a single list, such as a list of
student names, addresses, and phone numbers.
 Object; a component of a database, such as a table, query, form, or report.
 One-to-many relationship; a relationship in which a record in the primary table can be
related to one or more records in the related table

1
 . One-to-one relationship; a relationship between two tables in which for each record in
the first table, there is only one corresponding record in the related table.
 Primary Key; a field in a table whose value uniquely identifies each record in the table.
 Query; a request for a particular collection of data in a database.

 Data and Information


Data is information that can be interpreted and used by computer. It is a collection of
facts, such as numbers, words, measurements, observations etc. When the data are
processed and converted into a meaningful and useful form, it is known as
information. Hence, information can be defined as a set of organized and
validated collection of data. For example, 'James is 35 years old and he is a pastor.

 Data validation
Data validation is the process of ensuring that a program operates on clean, correct
and useful data. In other words, data validation is the practice of checking the integrity,
accuracy and structure of data before it is used for a business operation.

 Metadata
Metadata is literally "data about data." This term refers to information about data
itself. Perhaps the origin, size, formatting or other characteristics of a data item.
 Data Dictionary
A data dictionary is a collection of metadata that provides information about data stored
in a computer system. When developing programs that use the data model, a data
dictionary can be consulted to understand where a data item fits in the structure, what
values it may contain, and basically what the data item means in real-world terms.

I.1.1 Field, record, file

 In the structure of a database, the smallest component under which data is entered is the field.
All fields in the same database have unique names. Fields appear as columns in a table and
as cells in a form.

Several data fields make up a record, several records make up a file, and several files
make up a database.

A File is a named collection of logically related multiple records.

1.2. COMPUTERIZED VS NON COMPUTERIZED DATABASE

A non-computerized database is a database that is not stored on a computer. This type of


database is often used for small businesses or organizations that do not have the need or
budget for a computerized database. Non computerized databases can be created using a

2
variety of methods, such as papers, index cards, or a software program. A dictionary, a
phone book, a collection of recipes and a TV guide are all common examples of non-
computerized databases.

A computerized database is a technological tool that stores organized and structured data in
an electronic format. Computerized databases are widely used for data management in
various fields such as healthcare, finance, and education among others.

1.2.1. ADVANTAGES OF COMPUTERIZED DATABASE

Accuracy and consistency: One significant benefit of a computerized database is the


accuracy and consistency of the data. Data entered into the system is validated and updated
automatically, ensuring that it is accurate and up-to-date. This feature removes the
possibility of human errors that can occur when data entry is done manually.

Easy access and retrieval: Another significant advantage of a computerized database is


easy access and retrieval of data. With computerized databases, users can easily access and
retrieve data through a search feature using keywords, category or other search criteria.
Retrieving data manually is time consuming and requires human intervention.

Improved data security: Computerized databases offer improved data security by storing
data in a centralized location that is accessible only to authorized users. Data can also be
encrypted, ensuring that it is not accessible to unauthorized users. The used of computerized
databases minimizes the risk of data loss or unauthorized use of data.

1.2.2. DISADVANTAGES OF COMPUTERIZED DATABASES

Technical failure: Computerized databases are vulnerable to technical failures caused by


hardware or software malfunctions leading to data loss or corruption. Technical problems
can also cause interruption leading to a loss of productivity and revenue.

Initial cost and training: setting up and implementing a computerized database can be
costly, particularly for small businesses or organizations. Additionally, staff members
require training on how to operate the system effectively, leading to additional cost.

Dependency on technology: The use of computerized databases means organizations are


entirely dependent on technology for data management. In the event of technical issues, the
organization may face significant losses if no contingency plan is in place.

Security Risks: Computerized databases are vulnerable to security breaches such as


hacking, viruses, and cyber-attacks. These attacks can result in data loss, identity theft, or
serious consequences.

3
1.3. Differences between computerized and non-computerized database

1.4. File System and DBMS


File system is a method of organizing files with a hard disk or other medium of storage. File
system arranges files and in retrieving the files when required. It is compatible with different file
types, such as mp3, Doc, txt, mp4 etc and these are also grouped into directories. Examples of of
file systems are NTFS (New Technology File System) and EXT (Extended File System).
Without a file system, data would be stored in a single large body with no ability to distinguish
between one piece of data and the next.

On the other hand, a software program that manages databases is called a database management
system. A database is a set of linked data that may be efficiently utilized for data retrieval,
insertion, and deletion. Among other things, it organizes data into tables, schemas, views, and
reports. For instance, commercial databases like MySQL, Oracle, and others are frequently used
in various applications. It offers an interface for operations including, but not limited to, creating
a database, saving files, updating the database, and establishing a table in the database. It
4
guarantees the security and safety of the database. Furthermore, it guarantees consistency of data
when there are numerous users.

File System vs DBMS

File System DBMS


Used to manage and organize the files stores in A software to store and retrieve the user’s data
the hard disk of the computer
Redundant data is present No presence of redundant data
Data consistency is low Due to the process of normalization, the data
consistency is high
Less complex, does not support complicated More complexity in managing the data, easier
transactions. to implement complicated transactions.
Less security Supports more security mechanisms
Less expensive High cost than the file system
It doesn’t provide data backup or recovery if it Even if data is lost, DBMS systems offer
is lost. backup and recovery services.
This system does not support concurrency. DBMS systems offer a concurrency facility

1.5. TYPES OF DATABASES

Databases come in different forms to accommodate diverse data needs. Below are some type of
databases.

Relational database

This is the most common type of database, organized around tables that contain rows and
columns. Each row represents a record, while each column represents a field of data. Examples
of relational databases include PostgreSQL, and Microsoft SQL Server.

NoSQL Databases

These databases cater to non-tabular and unstructured data types. They are highly scalable and
can handle vast amounts of data in different formats. Examples include MongoDB (document-
based), Cassandra (column-based), and Redis (key-value store)

Object oriented databases

5
These databases store objects, which can be anything from text and numbers to images and
audio. They are suited for complex data structures and are used in applications involving
multimedia or engineering. Examples include db4o, ObjectStore, Versant, Zope Object Database
(ZODB), GemStone/S, ObjectDB.

Assignment

Make a comparison between NoSQL and Relational Database

1.6. IMPORTANCE OF DATABASES

Data organization: Databases offer systematic structures to store and arrange data effectively.
This arrangement not only upholds data integrity but also eradicates duplications

Data Retrieval: Databases excel at swiftly and effectively retrieving information. They can execute
intricate queries on extensive datasets in a matter of seconds, providing remarkable efficiency in data
retrieval. This capability proves indispensable for managing and extracting insights from large volumes
of information.

Data Integrity: Databases ensure data accuracy and consistency through thoughtful design and
well defined constraints. Validation rules play a crucial role by preventing the addition of
erroneous or incompatible data and upholding the integrity of the stored information.

Scalability: Databases can be made more accommodating to increasing data volumes and user
demands through scalability. This can involve upgrading hardware to scale vertically or adding more
servers to scale horizontally. Scalability ensures efficient management of resources to meet the evolving
requirements of the system.

Security: Databases provide essential security functionalities such as user authentication,


authorization mechanisms, and strong encryption protocols. These measures collectively
safeguard sensitive data from any unauthorized access, ensuring the confidentiality and integrity
of the information stored.

CHARACTERISTICS OF DBMS

Real-world entity: A modern DBMS is more realistic and uses real-world entities to design its
architecture. It uses the behavior and attributes too. For example, a school database may use
students as an entity and their age as an attribute.

Relation-based tables: DBMS allows entities and relations among them to form tables. A user
can understand the architecture of a database just by looking at the table names.

Isolation of data and application: A database system is entirely different than its data. A
database is an active entity, whereas data is said to be passive, on which the database works and
organizes. DBMS also stores metadata, which is data about data, to ease its own process.

6
Less redundancy: DBMS follows the rules of normalization, which splits a relation when any
of its attributes is having redundancy in values. Normalization is a mathematically rich and
scientific process that reduces data redundancy.

Consistency: Consistency is a state where every relation in a database remains consistent. There
exist methods and techniques, which can detect attempt of leaving database in inconsistent state.
A DBMS can provide greater consistency as compared to earlier forms of data storing
applications like file-processing systems.

Query Language: DBMS is equipped with query language, which makes it more efficient to
retrieve and manipulate data. A user can apply as many and as different filtering options as
required to retrieve a set of data. Traditionally it was not possible where file-processing system
was used.

Multiuser and Concurrent Access: DBMS supports multi-user environment and allows them to
access and manipulate data in parallel. Though there are restrictions on 2 transactions when users
attempt to handle the same data item, but users are always unaware of them.

 Multiple views: DBMS offers multiple views for different users. A user who is in the Sales
department will have a different view of database than a person working in the Production
department. This feature enables the users to have a concentrate view of the database according
to their requirements.

Security: Features like multiple views offer security to some extent where users are unable to
access data of other users and departments. DBMS offers methods to impose constraints while
entering data into the database and retrieving the same at a later stage. DBMS offers many
different levels of security features, which enables multiple users to have different views with
different features. For example, a user in the Sales department cannot see the data that belongs to
the Purchase department. Additionally, it can also be managed how much data of the Sales
department should be displayed to the user.

ACID Properties: DBMS follows the concepts of Atomicity, Consistency, Isolation, and
Durability (normally shortened as ACID). These concepts are applied on transactions, which
manipulate data in a database. ACID properties help the database stay healthy in multi-
transactional environments and in case of failure.

Explanation

 A- Atomicity requires transactions to be treated as a single "unit of work." The entire


sequence of transaction operations succeeds or fails as one entity. There is no partial
success or failure.

For example, transferring money from one bank account involves multiple steps:

 Debit amount X from Account A


 Credit amount X to Account B
7
As per atomicity, either all debit and credit operations succeed or they all fail. If the debit
succeeds, but the credit fails for any reason, the entire transaction is rolled back. Atomicity
ensures there are no partial or incomplete transactions.

 C – Consistency

The consistency property ensures that only valid data is written to the database. Before
committing a transaction, consistency checks are performed to maintain database constraints and
business rules.

For example, a transaction crediting 5000 to a bank account with a current balance of 3000 is
invalid if the account has an overdraft limit of 1000. The transaction violates consistency by
exceeding the permissible account limit. Hence, it is blocked and aborted.

 I-Isolation

Isolation maintains the independence of database transactions. Uncommitted transactions are


isolated with locking mechanisms to prevent dirty reads or lost updates.

For instance, if Transaction T1 updates a row, Transaction T2 must wait until T1 commits or
rolls back. Isolation prevents T2 from reading unreliable data updated by T1 but not committed
yet.

Isolation avoids concurrency issues like:

 Dirty reads - Reading uncommitted data from other transactions


 Lost updates - Overwriting another transaction's uncommitted updates
 Non-repeatable reads - Same query yielding different results across transactions

By isolating transactions, consistency is maintained despite concurrent execution and updates.


Changes remain isolated until permanent.

 D-Durability

Durability provides persistence guarantees for committed transactions. The system upholds the
changes once a transaction is committed even if it crashes later. Durability is achieved with
database backups, transaction logs, and disk storage.

For example, if a transaction updates a customer's address, durability ensures the updated address
is not lost due to a hard disk failure or power outage. The change will persist with the help of
storage devices, backups, and logs.

Durability ensures that transactions, once committed, will survive permanently. Failed hardware,
power loss, and even database crashes will not undo committed transactions due to durability
support.

DATA BASE USERS

8
A typical DBMS has users with different rights and permissions who use it for different
purposes. Some users retrieve data and some back it up. The users of a DBMS can be
broadly categorized as follows
Administrator:Administrators maintain the DBMS and are responsible for
administrating the database. They are responsible to look after its usage and by whom it
should be used. They create access profiles for users and apply limitations to maintain
isolation and force security. Administrators also look after DBMS resources like system
license, required tools, and other software and hardware related maintenance.
Designers: Designers are the group of people who actually work on the designing part of
the database. They keep a close watch on what data should be kept and in what format.
They identify and design the whole set of entities, relations, constraints, and views.
End Users: End users are those who actually reap the benefits of having a DBMS. End
users can range from simple viewers who pay attention to the logs or market rates to
sophisticated users such as business analysts.

Entity Relationship database model

An Entity relationship model (ER model) describes the structure of a database with the help

of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model

is a design or blueprint of a database that can later be implemented as a database. The main

Components of E-R model are: entity set and relationship set.

What is an Entity Relationship Diagram (ER Diagram)?

An Entity Relationship (ER) Diagram is a type of flowchart that illustrates how “entities” such as
people, objects or concepts relate to each other within a system. ER Diagrams are most often
used to design or debug relational databases in the fields of software engineering, business
information systems, education and research. Also known as ERDs or ER Models, they use a
defined set of symbols such as rectangles, diamonds, ovals and connecting lines to depict the
interconnectedness of entities, relationships and their attributes. They mirror grammatical
structure, with entities as
nouns and relationships as verbs.

The components and features of an ER diagram

ER Diagrams are composed of entities, relationships and attributes. They also depict cardinality,
Which determines relationships in terms of numbers. The main components are as follows

Entity
A definable thing such as a person, object, concept or event that can have data stored about it.
Think of entities as nouns. In E-R Diagram, an entity is represented using rectangles. Name of
the Entity is written inside the rectangle.
Examples: STUDENT, EMPLOYEE, ACCOUNT etc.

9
A Weak entity is an entity that depends on another entity. Weak entity doesn’t have key
attribute of their own. Double rectangle represents weak entity.

An Entity set is a collection of similar entities.


Examples: set of all persons, companies, Job positions, Courses, Academic staff, Managers,
Employees etc.
 Each entity set has a key.
 Each attribute has a domain.
ATTRIBUTES:
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by
each member of an entity set.
An Attribute describes a property or characteristics of an entity. For example, Name, Age,
Address etc can be attributes of a Student. An attribute is represented using eclipse.

Key Attribute:
Key attribute represents the main characteristics of an Entity. It is used to represent Primary
key. Ellipse with underlying lines represents Key Attribute.

Composite Attribute:
An attribute can also have their own attributes. These attributes are known as Composite
attribute.Composite attributes can be divided into subparts. For example, an attribute name
could be structured as a composite attribute consisting of first-name, middle-name and last name.

Multivalued Attributes:
An attribute that can hold multiple values is known as multivalued attribute. We represent it
with double ellipses in an E-R Diagram. E.g. A person can have more than one phone numbers
so the phone number attribute is multivalued.

10
Derived Attribute: A derived attribute is one whose value is dynamic and derived from
another attribute. It is represented by dashed ellipses in an E-R Diagram. E.g. Person age is a
derived attribute as it changes over time and can be derived from another attribute (Date of
birth).

Relationship
A relationship is an association (connection) among (between) two or more entities.
Relationships are typically shown as diamonds or labels directly on the connecting lines.

Types of relationship

1. One-to-One Relationship: When only one instance of an entity is associated with the
relationship, then it is known as one to one relationship. For example, a person has only one
passport and a passport is given to one person.

2. One to Many Relationship

When a single instance of an entity is associated with more than one


instances of another entity, then it is called one to many relationship. For
example, a customer can place many orders but an order cannot be placed
by many customers
11
Many to One Relationship

When more than one instances of an entity is associated with a single


instance of another entity then it is called many to one relationship. For
example, many employees can report to one department and one
department can have many employees.

Also, many students can study in one college and one college can have
many students.

Many to Many Relationship


When more than one instances of an entity is associated with more than one instances of another
Entity, then it is called many to many relationship. For example, a student can be assigned to
many projects and a project can be assigned to many students.

M N
Student Assigned Project

Total Participation of an Entity set


A Total participation of an entity set represents that each entity in entity set must have at least
one relationship in a relationship set. For example: In the diagram digram each college must
have at-least one associated Student.

12
Note: E-R diagrams can be drawn using chen notation or crow’s foot notation

Crow’s Foot Notation and Their Meaning


One of the most important terms to know when using crow's foot notation is cardinality.
Cardinality refers to the maximum number of times an instance in one entity can relate to
instances of another entity.

Ordinality, on the other hand, is the minimum number of times an instance in one entity can be
associated with an instance in the related entity.

13
Steps to Create an ERD (E-R Digram)

Example 1
In a un
iversity, a Student enrolls in Courses. A student must be assigned to at least
one or more Courses. Each course is taught by a single Professor. To
maintain instruction quality, a Professor can deliver only one course.
Establish the E-R diagram

Step 1. Entity Identification


We have three entities
 Student
 Course
 Professor

Step 2. Relationship Identification

We have the following two relationships


 The student is assigned a course
 Professor delivers a course

Step 3. Cardinality Identification


For them problem statement we know that,
14
 A student can be assigned multiple courses
 A Professor can deliver only one course

Step 4. Identify Attributes


You need to study the files, forms, reports, data currently maintained by the organization to
identify attributes. You can also conduct interviews with various stakeholders to identify
entities. Initially, it's important to identify the attributes without mapping them to a particular
entity.
Once, you have a list of Attributes, you need to map them to the identified entities. Ensure an
attribute is to be paired with exactly one entity. If you think an attribute should belong to more
than one entity, use a modifier to make it unique.
Once the mapping is done, identify the primary Keys. If a unique key is not readily available,
create one.

15
Step 5. Create the ERD
A more modern representation of ERD Diagram.

Best Practices for Developing Effective ER Diagrams


 Eliminate any redundant entities or relationships
 You need to make sure that all your entities and relationships are properly labeled
 There may be various valid approaches to an ER diagram. You need to make sure that the
ER diagram supports all the data you need to store
 You should assure that each entity only appears a single time in the ER diagram
 Name every relationship, entity, and attribute are represented on your diagram
 Never connect relationships to each other
 You should use colors to highlight important portions of the ER diagram.
Assignment
1. A publishing company produces books on various subjects. The books are written by authors
who specialize in one particular subject. The company employs editors who, not necessarily
being specialists in a particular area, each take sole responsibility for editing one or more book
publications. Every book requires some items for publication. These items supplied by suppliers.
One supplier can supply many items. Shop owners buy books from the publisher. Shop owner
can buy many books but one book can be bought by one shop owner only. Books are uniquely
identified by Bookid.
i. Identify the entities from the passage above
ii. Define an E-R Diagram
iii. Construct an E-R diagram and indicate the cardinality between entities using the
description above

III. RELATIONAL DATABASE


The relational database model is used in most of today's commercial databases. The
relational database model is based on a mathematical concept where relations are interpreted
as tables
Database Keys
In a DBMS, KEYS are attributes or sets of attributes that enable the identification of a row
or tuple within a relation or table. or we say a key is an attribute or set of attributes to

16
identify rows uniquely. They establish connections between different tables and assist in
uniquely identifying a row by utilizing one or more columns in the table. They help enforce
integrity and help identify the relationship between tables.

i. Super Key
A Super key is any combination of fields within a table that uniquely identifies each
record within that table. Consider the employee table below.
Emp_ID Emp_name Emp_spcode Emp_sal Emp_contact Emp_email

101 Martin 0000 350000 6777112321 [email protected]

102 Peter 1111 230000 620564002 Null

102 Martin 2222 400000 null [email protected]

104 Ngwa 3333 230000 682123001 [email protected]

Super key {Emp_ID, Emp_spcode, (Emp_ID,Emp_name), Emp_ID, Emp_spcode),


(Emp_ID, Emp_spcode), (Emp_ID, Emp_sal), (Emp_ID, contact),………………….}.
From the super key we can identify all the remaining keys. It can be a single key or a set of
keys that uniquely identifies a record.
ii. Candidate key
In a table, there can be more than one field that can uniquely identify each record (it is the
minimal set of super key). All such fields are known as candidate keys. From the super
key above, the minimal set is Emp_ID, Emp_spcode. One of these candidate keys is
chosen as a primary key; the other keys that are not chosen as primary key are known as
alternate keys or secondary keys.
c) Primary Key:

A field or a set of fields that uniquely identify each record in a table is known as a
primary key. It is any key from the candidate key. This implies that no two records in the
relation can have same value for the primary key. For example, an employee number
uniquely identifies a member of staff within a company.
An IP address uniquely addresses a PC on the internet. A primary key is mandatory. That is,
each entity occurrence must have a value for its primary key.
d) Foreign Key:

A field of a table that references the primary key of another table is referred to as foreign
key. The figure below illustrates how a foreign key constraint is related to a primary key
constraint. Here, the field Item_Code in the PURCHASE table references the field
Item_Code in the ITEM relation. Thus, the attribute Item_Code in the PURCHASE relation
is the foreign key.

17
e) Compound/composite Key
A compound key consists of more than one attribute to uniquely identify an entity
occurrence. Each attribute, which makes up the key, is also a simple key in its own right. For
example, we have an entity named enrolment, which holds the courses on which a student is
enrolled. In this scenario a student is allowed to enroll on more than one course. This has a
compound key of both student number and course number, which is required to uniquely identify
a student on a particular course.

f) Alternate key
All candidate keys except primary key.

18

You might also like