0% found this document useful (0 votes)
18 views

Unit I-Database Management System

Uploaded by

kshb29msyq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Unit I-Database Management System

Uploaded by

kshb29msyq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

DATABASE MANAGEMENT SYSTEM (B0SC 2030)

UNIT I

UNDERSTANDING THE MAIN ISSUES RELATED TO DATABASE SYSTEM


IN GENERAL

INTRODUCTION

Introduction
Data is increasingly seen as a corporate asset that can be used to make better-informed business
decisions, with the goal of increasing revenue and profits and reducing costs.

To assess data’s monetary value, consider what is stored in a company database: data about
customers, suppliers, inventory, operations, and so on.

How many opportunities are lost if the data is lost?

What is the actual cost of data loss?

In effect, an organization is subject to a data-information-decision cycle; that is, the data user
applies intelligence to data to produce information that is the basis of knowledge used in decision
making.

This cycle is illustrated in Figure below.

Efficient asset management is critical to the success of an organization.

To manage data as a corporate asset, managers must understand the value of information

For some companies, such as credit reporting agencies, their only product is information, and
their success is solely a function of information management.

Most organizations continually seek new ways to leverage their data resources to get greater
returns.

This leverage can take many forms, from data warehouses that support improved customer
relationships to tighter integration with customers and suppliers in support of the electronic supply
chain.
As organizations become more dependent on information, that information’s accuracy becomes
more critical.

Dirty data, or data that suffers from inaccuracies and inconsistencies, becomes an even greater
threat.

Data can become dirty for many reasons:

 Lack of enforcement of integrity constraints, such as not null, uniqueness, and referential
integrity

 Data-entry errors and typographical errors

 Use of synonyms and homonyms across systems

 Nonstandard use of abbreviations in character data

 Different decompositions of composite attributes into simple attributes across systems

Some causes of dirty data, such as improper implementation of constraints, can be addressed
within an individual database.

However, addressing other causes is more complicated

Some dirty data comes from the movement of data across systems, as in the creation of a data
warehouse.

The Need for a Database and its Role in an Organization

Data is used by different people in different departments for various reasons.

Therefore, data management must address the concept of shared data.

Used properly, the DBMS facilitates:

• Interpretation and presentation of data in useful formats by transforming raw data into
information

• Distribution of data and information to the right people at the right time

• Data preservation and monitoring data usage for adequate periods of time

• Control over data duplication and use, both internally and externally

Regardless of the organization, the database’s predominant role is to support managerial decision
making at all levels in the organization while preserving data privacy and security.

An organization’s managerial structure might be divided into three levels: top-level management
makes strategic decisions, middle management makes tactical decisions, and operational
management makes daily working decisions.
Operational decisions are short term; for example, a manager might change the price of a product
to clear it from inventory.

Tactical decisions involve a longer time frame and affect larger-scale operations - or example,
changing the price of a product in response to competitive pressures.

Strategic decisions affect the long-term well-being of the company or even its survival - for
example, changing the pricing strategy across product lines to capture market share.

The DBMS must give each level of management a useful view of the data and support the required
level of decision making

Definition of Important Concepts

Data:

Basically, all the facts about things are termed Data.

We always deal with data.

All the details around us are termed as data, like name, phone no, address.

So, in simple words, we can say that it is a Raw Fact i.e. Characters, Numbers, special characters.

For example, Empid is data, Ename is data, Salary is data, DOJ is data, etc.

Data does not give accurate or meaningful statements or information to users.

For example, from the above data, we cannot say whether Warner is the name of an employee, or
name of a customer, or the name of a Product because Warner is simply data.

Information:

Among all, the meaningful data is called Information.

We fetch only the information from all the facts.

So, in simple words, we can say that processing the data or raw facts is called information.

And the information will provide meaningful statements.


Note: information always provide meaningful data of a particular employee, customer, student,
and product, etc.

For example, from the information, we can say that Warner is the name of an Employee.

Data Model

A model is a representation of ‘real world’ objects and events and their associations to make the
data understandable.

It can be defined as an integrated collection of concepts for describing and manipulating data,
relationships between data, and constraints on the data in an organization.

A data model comprises of three components:

• A structural part

Consist of a set of rules according to which databases can be constructed.

• A manipulative part,

Defines the types of operation that are allowed on the data (this includes the operations
that are used for updating or retrieving data from the database and for changing the
structure of the database).

• Constraint part

Possibly a set of integrity rules, which ensures that the data is accurate.

Database Server

A Database Server can be defined as a computer dedicated to providing database services.

The database server holds the Database Management System (DBMS) and the databases.

A database server can typically be seen in a client-server environment where it provides


information sought by the client systems.

The client is the application, which is used to interface with the DBMS, while database server is
a DBMS.
Central DBMS functions on the server are referred to as the back-end functions, whereas the
application programs on the client computer as front-end programs.

A standard called ODBC (Open Database Connectivity) provides an application programming


interface (API), which allows client side programs to call the DBMS on the server side.

Hence, a client program connects to the Database server and sends requests (queries) using the
ODBC Application Programming Interface (API).

The-server processes the queries and sends back the results of queries to the client program, which
are processed by the client computer.

All database functions are controlled by the database server.

Database Objects

A database object is a data structure used to either store or reference data.

Anything which we make from create command is known as Database Object.

It can be used to hold and manipulate the data.

Some of the examples of database objects are;

 Table – Basic unit of storage; composed rows and columns

 View – Logically represents subsets of data from one or more tables

 Sequence – Generates primary key values

 Index – Improves the performance of some queries

 Synonym – Alternative name for an object.

Data Quality

Data quality is a comprehensive approach to ensuring the accuracy, validity, and timeliness of
data.

This comprehensive approach is important because data quality involves more than just cleaning
dirty data; it also focuses on preventing future inaccuracies and building user confidence in the
data.

Large-scale data quality initiatives tend to be complex and expensive projects, so the alignment
of these initiatives with business goals is a must, as is buy-in from top management.

While data quality efforts vary greatly from one organization to another, most involve the
following:

• A data governance structure that is responsible for data quality

• Measurements of current data quality


• Definition of data quality standards in alignment with business goals

• Implementation of tools and processes to ensure future data quality

DATA MANAGENENT

Data management is the process of ingesting, storing, organizing and maintaining the data created
and collected by an organization.

The data management process includes a wide range of tasks and procedures, such as:

 Collecting, processing, validating, and storing data

 Integrating different types of data from disparate sources, including structured and
unstructured data

 Ensuring high data availability and disaster recovery

 Governing how data is used and accessed by people and apps

 Protecting and securing data and ensuring data privacy

Data management has also grown in importance as businesses are subjected to an increasing
number of regulatory compliance requirements, including data privacy and protection laws.

The separate disciplines that are part of the overall data management process cover a series of
steps, from data processing and storage to governance of how data is formatted and used in
operational and analytical systems.

The key disciplines of the data management process are explained below and shown in the figure;
Fig: Key parts of the data management process

 Data Architecture

A data architecture is often the first step, particularly in large organizations with lots of
data to manage.

A data architecture provides a blueprint for managing data and deploying databases and
other data platforms, including specific technologies to fit individual applications.

 Data modeling:

Data are being designed through the different models, the relationship between the data
and other details are portrayed through this concept.

 Data Mining:

It is used for transforming raw data into information. .

It is a major concept for handling data.

 Data integration:

It combines different data from different sources and also analyzes those data for the
processing of information.

 Data governance:

Data handling policies are made under this concept; it also confirms data fetching
consistency and other related issues.

Data Storage/Data Management Approaches:


Below are the data storage/data management approaches.

1. Books and Papers

This is a manual record keeping systems in which human being manage the whole
database without the support of computers.

It has got many problems.

Disadvantages

o It is a completely manual process/system.

o It requires more manpower.

o Maintenance is very costly

o There is no security

o Store a very small data/information

o Retrieval is very difficult as well as time-consuming.

Consider an example of address dictionary.

What will you do when the page allotted for names beginning with “H” is finished?

One option is to buy a new storage of larger size and to transfer the previous ones onto
the new one, start with the next address to be stored.

The second option is to use some blanks pages at the end of address book for storing
remaining address of the persons whose names starts with “H”.

The first option is obviously very tiring and time consuming and thinks of the second
option, to find a person having a name starting with ‘H’ you will have to search it in two
different places, which is quite cumbersome procedure

2. File-Based Approach for Data Management:

The file-based approach' refers to the situation where data is stored in one or more separate
computer files defined and managed by different application programs.

The data can be stored in files with help of the Operating System

Computer programs access the stored files to perform the various tasks required by the
business.

Each program, or sometimes a related set of programs, is called a computer application.


For example, all of the programs associated with processing customers' orders are referred
to as the order processing application.

The following diagram shows how different applications will each have their own copy
of the files they need in order to carry out the activities for which they are responsible:

Limitations

 Data duplication:

Each program stores its own separate files.

If the same data is to be accessed by different programs, then each program must
store its own copy of the same data.

 Data inconsistency:

If the data is kept in different files, there could be problems when an item of data
needs updating, as it will need to be updated in all the relevant files.

If this is not done, the data will be inconsistent, and this could lead to errors.

 Difficult to implement data security:

Data is stored in different files by different application programs.

This makes it difficult and expensive to implement organisation wide security


procedures on the data.

One approach to solving the problem of each application having its own set of files is to
share files between different applications.

The shared file approach

This will alleviate the problem of duplication and inconsistent data between different
applications, and is illustrated in the diagram below

:
The introduction of shared files solves the problem of duplication and inconsistent data,
but other problems may emerge, including:

 File incompatibility:

It means that the file cannot be used or opened by the software or device that you
are trying to use it with.

If departments have to share files, the file structure that suits one department might
not suit another.

For example, data might need to be sorted in a different sequence for different
applications (for instance, customer details could be stored in alphabetical order,
or numerical order, or ascending or descending order of customer number).

 Difficult to control access:

Some applications may require access to more data than others.

The file will still need to contain the additional information to support the
application that requires it.

 Physical data dependence:

If the structure of the data file needs to be changed in some way (for example, to
reflect a change in currency), this alteration will need to be reflected in all
application programs that use that data file.

This problem is known as physical data dependence.

 Difficult to implement concurrency:

While a data file is being processed by one application, the file will not be available
for other applications or for ad hoc queries.

This is because, if more than one application is allowed to alter data in a file at one
time, serious problems can arise in ensuring that the updates made by each
application do not clash with one another.
File-based systems avoid these problems by not allowing more than one
application to access a file at one time.

In order to remove all limitations of the File Based Approach, a new approach was
required that is more effective known as Database approach.

3. Database Oriented Approach

The database approach provides facilities for querying, data security and integrity, and
allows simultaneous access to data by a number of different users.

One of the benefits of the database approach is that the problem of physical data
dependence is resolved.

This means that the underlying structure of a data file can be changed without the
application programs needing amendment.

This is achieved by a hierarchy of levels of data specification called a schema.

DATABASE

It is a collection of inter-related data which contains the information of an organization/enterprise.

It is obtained by collecting data from all the data sources of an organization.

The database is a computer-based record-keeping system whose overall purpose is to record and
maintain information.

The database is a single, large repository of data that can be used simultaneously by many users.

It is a collection of interrelated information by using the database we can store, modify, select and
delete data from the database in a secure manner.

A database is usually controlled by a database management system (DBMS).

Together, the data and the DBMS, along with the applications that are associated with them, are
referred to as a database system, often shortened to just database.

Most databases use structured query language (SQL) for writing and querying data.
Evolution of the database

Databases have evolved dramatically since their inception in the early 1960s.

Navigational databases such as the hierarchical database (which relied on a tree-like model and
allowed only a one-to-many relationship), and the network database (a more flexible model that
allowed multiple relationships), were the original systems used to store and manipulate data.

Although simple, these early systems were inflexible.

In the 1980s, relational databases became popular, followed by object-oriented databases in the
1990s.

More recently, NoSQL databases came about as a response to the growth of the internet and the
need for faster speed and processing of unstructured data.

Today, cloud databases and self-driving databases are breaking new ground when it comes to how
data is collected, stored, managed, and utilized.

Difference between a database and a spreadsheet?

Databases and spreadsheets (such as Microsoft Excel) are both convenient ways to store
information.

The primary differences between the two are:

 How the data is stored and manipulated

 Who can access the data

 How much data can be stored

Spreadsheets were originally designed for one user, and their characteristics reflect that.

They’re great for a single user or small number of users who don’t need to do a lot of incredibly
complicated data manipulation.

Databases, on the other hand, are designed to hold much larger collections of organized.

Databases allow multiple users at the same time to quickly and securely access and query the data
using highly complex logic and language.

Type of information is stored in a database?

Databases are used in most modern applications, whether the database is on your personal phone,
computer, or the internet.

An operational database system will store much of the data an application needs to function,
keeping the data organized and allowing users to access the data.

If you were building an ecommerce app, some of the data you might access and store in your
operational database system includes:
 Customer data, like usernames, email addresses, and preferences.

 Business data, like product colors, prices, and ratings.

 Relationship data, like the locations of stores with a specific product in stock.

Building blocks of a Database

The following three components form the building blocks of a database.

 Columns.

Columns are similar to fields, that is, individual items of data that we wish to store.

A Student’ Roll Number, Name, Address etc. are all examples of columns.

They are also similar to the columns found in spreadsheets (the A, B, C etc. along the top).

 Rows.

Rows are similar to records as they contain data of multiple columns (like the 1, 2, 3 etc.
in a spreadsheet).

A row can be made up of as many or as few columns as you want.

This makes reading data much more efficient – you fetch what you want.

 Tables.

A table is a logical group of columns.

For example, you may have a table that stores details of customers’ names and addresses.
Another table would be used to store details of parts and yet another would be used for
supplier’s names and addresses.

It is the tables that make up the entire database and it is important that we do not duplicate
data at all.

Characteristics of database

Many characteristics distinguish databases from another form of file-based approach.

The characteristics of database system are;


a) Data sharing and multiuser systems:

A multiuser database allows several users to access the database at once.

As a result, multiuser databases typically use concurrency control to enable many users to
having access to similar data items at the consecutively while ensuring that the data is
accurate in terms of data integrity.

b) Control of data integrity:

In most databases, each data object is stored in a single location.

Redundancy still exists in some situations, but it is regulated and reduced to the bare
minimum in order to increase device efficiency.

c) Self-descriptive nature:

A database system includes not only the database itself, but also data structure
explanations and flaws.

When the need arises, DBMS software or database users use this piece of knowledge.

The line of demarcation distinguishes a framework from standard file-based system data
specification as part of application programs.

d) Provision of multiple views of data:

Individual users can have different database views.

For example, a view may be a subset of the database.

As a result, users do not need to be aware of how and where the data they're talking about
is stored.

e) Separation of software and data:

In a file-based environment, the data file structure is often specified in the application
programs.

As a consequence, if a user wants to change the structure, all of the programs that use that
file must also be changed.

If not, the data structure, not the programs, is saved in the machine catalogue.

Only one change is needed in light of this.

f) In a database, a combined field makes up a table.

For instance, a user table contains fields that present data about that user.

g) A database is logical, coherent, and internally consistent.


h) An individual data item is stored in a field.

i) Databases are created, develop, build, and populated with data for a certain purpose.

j) A database is a representation of a certain aspect of the real world, or better still, a


combination of data components (facts) representing information in the real world

Operations performed on Database

These terms CRUD describe the four essential operations for creating and managing persistent
data elements, mainly in relational and NoSQL databases.

 CREATE

The CREATE operation adds a new record to a database.

In RDBMS, a database table row is referred to as a record, while columns are called
attributes or fields.

The CREATE operation adds one or more new records with distinct field values in a table.

The same principle applies to NoSQL databases.

If the NoSQL database is document-oriented, then a new document (for example, a JSON
formatted document with its attributes) is added to the collection, which is the equivalent
of an RDBMS table.

Similarly, in NoSQL databases like DynamoDB, the CREATE operation adds an item
(which is equivalent to a record) to a table.

 READ

READ returns records (or documents or items) from a database table (or collection or
bucket) based on some search criteria.

The READ operation can return all records and some or all fields.

 UPDATE

UPDATE is used to modify existing records in the database.

For example, this can be the change of address in a customer database or price change in
a product database.

Similar to READ, UPDATEs can be applied across all records or only a few, based on
criteria.

An UPDATE operation can modify and persist changes to a single field or to multiple
fields of the record.

If multiple fields are to be updated, the database system ensures they are all updated or
not at all.
Some big data systems don’t implement UPDATE but allow only a timestamped
CREATE operation, adding a new version of the row each time.

 DELETE

DELETE operations allow the user to remove records from the database.

A hard delete removes the record altogether, while a soft delete flags the record but leaves
it in place.

For example, this is important in payroll where employment records need to be maintained
even after an employee has left the company.

DATABASE MANAGEMENT SYSTEM (DBMS)

A database typically requires a comprehensive database software program known as a database


management system (DBMS).

DBMS stands for Database Management System.

We can break it like this DBMS = Database + Management System.

It is the software that is used to manage and maintain data in the database.

By using DBMS, we can create new databases, new tables, insert, update, delete and select the
data from the database.

Some examples of popular DBMSs include;

 MySQL,

 Microsoft Access,

 Microsoft SQL Server,

 FileMaker Pro,

 Oracle Database, and

 dBASE.

What is the need of DBMS?

Database systems are basically developed for large amount of data.

When dealing with huge amount of data, there are two things that require optimization:

 Storage:

According to the principles of database systems, the data is stored in such a way that it
acquires lot less space as the redundant data (duplicate data) has been removed before
storage.
Let’s take a layman example to understand this:

In a banking system, suppose a customer is having two accounts, one is saving account
and another is salary account.

Let’s say bank stores saving account data at one place (called tables) and salary account
data at another place, in that case if the customer information such as customer name,
address etc. are stored at both places then this is just a wastage of storage
(redundancy/duplication of data).

To organize the data in a better way the information should be stored at one place and
both the accounts should be linked to that information somehow.

 Fast Retrieval of data:

Along with storing the data in an optimized and systematic manner, it is also important
that we retrieve the data quickly when needed.

Database systems ensure that the data is retrieved as quickly as possible.

DBMS Applications

Applications where we use Database Management Systems are:

 Telecom:

There is a database to keep track of the data regarding calls made, network usage, customer
details etc.

Without the database systems it is hard to maintain that huge amount of data that keeps
updating every millisecond.

 Industry:

Where it is a manufacturing unit, warehouse or distribution centre, each one needs a


database to keep the records of ins and outs.

For example, distribution centre should keep track of the product units that supplied into
the centre as well as the products that got delivered out from the distribution centre on
each day; this is where DBMS comes into picture.

 Banking System:

For storing customer information, tracking day to day credit and debit transactions,
generating bank statements etc.

All this work has been done with the help of Database management systems.

 Sales:

To store customer information, production information and invoice details.


 Airlines:

To travel through airlines, we make early reservations, this reservation information along
with flight schedule is stored in database.

 Education sector:

Database systems are frequently used in schools and colleges to store and retrieve the data
regarding student details, staff details, course details, exam details, payroll data,
attendance details, fees details etc.

There is a lot of inter-related data that needs to be stored and retrieved in an efficient
manner.

 Online shopping:

You must be aware that online shopping websites such as Amazon, Flipkart etc.

These sites store the product information, your addresses and preferences, credit details
and provide you the relevant list of products based on your query.

All this involves a Database management system.

The list of the mentioned very few applications is never going to end if we start mentioning
all the DBMS applications.

DBMS Functions

A DBMS performs several important functions that guarantee integrity and consistency of data
in the database.

Most of these functions are transparent to end-users.

There are the following important functions and services provided by a DBMS:

 Data Storage Management:

It provides a mechanism for management of permanent storage of the data.

The internal schema defines how the data should be stored by the storage management
mechanism and the storage manager interfaces with the operating system to access the
physical storage.

 Data Manipulation Management:

A DBMS furnishes users with the ability to retrieve, update and delete existing data in the
database.

 Data Definition Services:

The DBMS accepts the data definitions such as external schema, the conceptual schema,
the internal schema, and all the associated mappings in source form.
 Data Dictionary/System Catalog Management:

The DBMS provides a data dictionary or system catalog function in which descriptions of
data items are stored and which is accessible to users.

 Database Communication Interfaces:

The end-user’s requests for database access are transmitted to DBMS in the form of
communication messages.

 Authorization / Security Management:

The DBMS protects the database against unauthorized access, either international or
accidental.

It furnishes mechanism to ensure that only authorized users have access the database.

 Backup and Recovery Management:

The DBMS provides mechanisms for backing up data periodically and recovering from
different types of failures.

This prevents the loss of data,

 Concurrency Control Service:

Since DBMSs support sharing of data among multiple users, they must provide a
mechanism for managing concurrent access to the database.

DBMSs ensure that the database kept in consistent state and that integrity of the data is
preserved.

 Transaction Management:

A transaction is a series of database operations, carried out by a single user or application


program, which accesses or changes the contents of the database.

Therefore, a DBMS must provide a mechanism to ensure either that all the updates
corresponding to a given transaction are made or that none of them is made.

 Multi-User Access Control

The DBMS creates the complex structures that allow multiple users access to the data.

In order to provide data integrity and data consistency the DBMS uses sophisticated
algorithms to ensure that multiple users can access the DB concurrently without
compromising the integrity of the DB

 Database Access and Application Programming Interfaces:

All DBMS provide interface to enable applications to use DBMS services.


They provide data access via Structured Query Language (SQL). T

Advantages of a DBMS

The availability of a DBMS between the database and the end-users application provides
numerous benefits.

For example, a DBMS enables data in a database to be distributed by several users or applications.

Furthermore, a database management system (DBMS) integrates multiple users' perspectives on


data into a single which is more or less like a data repository.

Data is critical to the substance out of which data is derived, collected or procured, so a good data
management system is needed.

Essentially, a DBMS has the following advantages

 Controlling redundancy:

Storing identical data in a single file in a database management system (DBMS) can meet
the needs of all three users.

As a result, maintaining several copies of the same data may be constrained in terms of
redundancy.

However, in practice, there can be times when we need to add a small amount of
redundancy to the database for performance purposes.

This is why it is important to reduce rather than eliminate redundancy.

 Reduction in data consistency:

Inconsistency in data occurs when various data versions appear at various places.

 Improvements in decision-making:

The generation of higher-quality data, which can be used to make informed decisions, is
made possible by well-managed data and improved data access.

The quality of the information used is determined primarily by the quality of the raw data.

Data quality is an integrated approach to ensuring data accuracy, accuracy and validity.

Data quality cannot be guaranteed by DBMS, so it provides a mechanism for supporting


initiatives for data quality.

 Increased end-user productivity:

End users can make swift, educated decisions based on data availability and resources that
turn data into usable knowledge, which can mean the difference between success and
failure.
 Economical:

Economy, in general, refers to the cost of a group of operations that is less than the amount
of the costs of individual efforts.

The database approach evolved into application centralization, resulting in a reliance on


large, expensive, and powerful computers as well as technical expertise in one place.

Economies of size are typically the product of this.

Besides, since many people share a database, any database modification of upgrade would
help everybody.

 Improved data sharing:

The DBMS promotes the creation of an environment where users have unlimited access
to numerous well-managed data.

In light of this, end-users can promptly respond to modification in the environment


following such access.

 Balancing conflicting requirements:

The database system ensures a balance between the competing needs of different data
users.

The database system considers the needs of both individual users and the organization as
a whole

 Efficient data integration:

Unrestricted access to a properly administered data enables a holistic view of an


organisation success and functionality as well a better idea of the situation.

 Data security improvement:

The higher the potential for data access, the greater the data protection risks.

Since the Database Administrator (DBA) has power over operational data, authorization
protocols may be set up that allow only registered users to have access to it.

Various users may have various types of access to the same data with the assistance of

DBA.

 Flexibility and responsiveness:

In a file-oriented approach, data handled with different files from different users is not
portable or sensitive.
For example, if a file in a file-oriented system is arranged in an alphabetical order by
author, the system will not respond to queries to view the list in different orders, such as
alphabetical order by title, topic, publisher, or date.

If the same information is stored in a structured database, a user may get a response from
the database in one of the ways described above.

Besides, due to the DBMS' flexibility, programmers can create new programs in response
to unique user requests.

 Improvement in data access:

The DBMS enables generation of fast answers to ad hoc queries.

A query is a request to the DBMS to access the data in a particular way, such as reading
or changing it.

A query is simply a question, and an ad hoc query is a question that is posed on the spur
of the moment.

The DBMS responds to the application with a response (referred to as the query result).

When dealing with voluminous sales data, for example, end users may want fast answers
to questions (ad hoc queries).

Disadvantages of DBMS

You should note that, as there advantages of databases, so also there are disadvantages.

Some of the identified disadvantages are;

 Difficulty in Data recovery:

Due to the DBMS's complexity, data recovery in the event of a catastrophe is much more
difficult and complicated compared to what obtained in a file-oriented system.

 Complexity:

Based on the complexity and scope of applications provided by a DBMS, it is a


complicated addition.

There are choices to make when developing and implementing a new application using a
DBMS.

However, there is a risk of making incorrect decisions, especially if one's understanding


of the DBMS is inadequate.

 Cost:

A better database management system is a costly item.


The overall cost of all the exact elements relevant to DBMS for a large mainframe system
may be in the millions of kwacha range.

 Additional hardware requirements:

Due to the project's size and complexity of DBMS, more hardware resources would be
needed.

Users can experience a substantial drop in performance if the system's hardware resources
aren't upgraded when a DBMS is purchased.

 The negative effect of hardware-software mal-functionality:

Many of the data processing tools in the information system are concentrated in the
database.

Any hardware/software failure will have much more serious consequences than in a non-
database setting.

In a database setting, all nodes will fail; however, in a file-oriented system, only one node
will fail.

 Size:

A DBMS must be a large program to accommodate all of the complex applications that it
must provide to users, consuming huge megabytes of storage space including a
considerable amount of internal memory.

The program's scale grows in proportion to its complexity.

THE DATABASE SYSTEM ENVIRONMENT

This is an assemblage of elements that describe and regulate the gathering, storage, and
management perspective.

A database system is comprised of hardware, software, people, procedures, and data


1. Hardware:

Computers, input and output devices, storage devices, another physical, electronic devices
make up the hardware.

This connects computers and the real world system together.

2. Software:

This is a set of programs for managing and monitoring the database as a whole.

Database software, operating system, network software that allows users to share data,
and application programs that enable users to access data in the database is all included

The software is the actual DBMS.

The DBMS allows the users to communicate with the database.

Between the physical databases itself (i.e. the data as actually stored) and the users of the
system is a layer of software, usually called the Database Management System or DBMS..

One general function provided by the DBMS is thus the shielding of database users from
complex hardware-level detail.
3. Data:

Data is an unstructured, unprocessed fact that must be processed in order to be useful.

Unless it is organized, data can be simple and disorganized at the same time.

Data can be everything from truth to observations to experiences to numbers, characters,


symbols, and image

It is the most important component of DBMS environment from the end users point of
view.

Data acts as a bridge between the machine components and the user components.

The database contains the operational data and the meta-data.

The database should contain all the data needed by the organization.

4. Procedures:

Procedures are a set of guidelines and regulations that will help you use the database
management system more effectively.

It is the process of creating and running a database using documented methods in order to
direct the users who run and manage it

Procedures may consist of instructions on how to:

i Log on to the DBMS.

ii Use a particular DBMS facility or application program.

iii Start and stop the DBMS.

iv Make backup copies of the database.

v Handle hardware or software failures.

vi Change the structure of a table, reorganize the database across multiple disks,
improve performance, or archive data to secondary storage.

5. Users:
There are a number of users who can access or retrieve data on demand using the
applications and interfaces provided by the DBMS.

Each type of user needs different software capabilities.

The users of a database system can be classified in the following groups, depending on
their degrees of expertise or the mode of their interactions with the DBMS.

The users can be:

a) Naive Users:

Naive Users are those users who need not to be aware of the presence of the
database system or any other system supporting their usage.

Naive users are end users of the database who work through a menu driven
application program, where the type and range of response is always indicated to
the user.

A user of an Automatic Teller Machine (ATM) falls in this category.

The user is instructed through each step of a transaction.

b) Online Users:

Online users are those who may communicate with the database directly via an
online terminal or indirectly via a user interface and application program.

These users are aware of the presence of the database system and may have
acquired a certain amount of expertise with limited interaction permitted with a
database.

c) Sophisticated Users:

Such users interact with the system without, writing programs.

Instead, they form their requests in database query language.

d) Specialized Users:

Such users are those, who write specialized database application that do not fit into
the normal data-processing framework.

For example: Computer-aided design systems, knowledge base and expert system,
systems that store data with complex data types (for example, graphics data and
audio data).

e) Application Programmers:

Professional programmers are those who are responsible for developing


application programs or user interface.
The application programs could be written using general purpose programming
language or the commands available to manipulate a database.

f) Database Administrator:

Database administration is the task of maintaining the integrity of a database.

The database administrator (DBA) is the person or group responsible for handling
the database and making sure that the data is stable and has integrity.

The size and role of the DBA function varies from company to company, as does
its placement within the organizational structure.

There is no standard for how the DBA function fits in an organization’s structure,
partly because the function itself is probably the most dynamic of any in an
organization.

In fact, the fast-paced changes in DBMS technology dictate changing


organizational styles.

DBA operations are commonly defined and divided according to the phases of the
Database Life Cycle (DBLC).

The DBA function requires personnel to cover the following activities:

o Database planning, including the definition of standards, procedures, and


enforcement

o Evaluating, selecting, and installing the DBMS and related utilities

o Testing and evaluating databases and application

o Database requirements gathering and conceptual design

o Database logical and transaction design

o Database software selection

o Database physical design and implementation

o Database testing and debugging

o Database backup and recovery

o Ensuring quality and integrity of data and applications

o Database operations and maintenance, including installation, conversion,


and migration

o Database training and support

o Data quality monitoring and management


The Figure represents a DBA functional organization

Keep in mind that a company might have several incompatible DBMSs installed
to support different operations.

For example, some corporations have a hierarchical DBMS to support daily


transactions at the operational level and a relational database to support middle
and top management’s ad hoc information needs.

A variety of personal computer DBMSs might be installed in different


departments.

In such an environment, the company might have one DBA assigned for each
DBMS.

The general coordinator of all DBAs is sometimes known as the systems


administrator; that position is illustrated in Figure below

Database Administration Tools

The database administration tools cover the entire spectrum of data administration
tasks, from selection to inception, deployment, migration, and day-to-day
operations.

For example, you can find sophisticated data administration tools for:

o Database monitoring
o Database load testing

o Database performance tuning

o SQL code optimization

o Database bottleneck identification and remediation

o Database modeling and design

o Database data extraction, transformation, and loading

All the above-mentioned tools have something in common.

They all expand the database’s metadata or data dictionary.

The importance of the data dictionary as a DBA tool cannot be overstated.

DATA DICTIONARY

A data dictionary is defined as “a DBMS component that stores the definition of data
characteristics and relationships.”

You may recall that such “data about data” are called metadata.

The DBMS data dictionary provides the DBMS with its self-describing characteristic.

In effect, the data dictionary resembles an x-ray of the company’s entire data set, and it is a crucial
element in data administration.

Two main types of data dictionaries exist: integrated and standalone.

An integrated data dictionary is included with the DBMS.

For example, all relational DBMSs include a built-in data dictionary or system catalog that is
frequently accessed and updated by the RDBMS.

Other DBMSs, especially older types, do not have a built-in data dictionary; instead, the DBA
may use third-party standalone systems.

Data dictionaries can also be classified as active or passive.

An active data dictionary is automatically updated by the DBMS with every data base access to
keep its access information up to date.

A passive data dictionary is not updated automatically and usually requires running a batch
process.

Data dictionary access information is normally used by the DBMS for query optimization.

The data dictionary’s main function is to store the description of all objects that interact with the
database.
Integrated data dictionaries tend to limit their metadata to the data managed by the DBMS.

Standalone data dictionary systems are usually more flexible and allow the DBA to describe and
manage all of the organization’s data, whether they are computerized or not.

Whatever the data dictionary’s format, it provides database designers and end users with a much-
improved ability to communicate.

In addition, the data dictionary is the tool that helps the DBA resolve data conflicts.

Although there is no standard format for the information stored in the data dictionary, several
features are common.

For example, the data dictionary typically stores descriptions of the following:

• Data elements that are defined in all tables of all databases.

Specifically, the data dictionary stores element names, data types, display format, internal
storage format, and validation rules.

The data dictionary explains where an element is used, who used it, and so on.

• Tables defined in all databases.

For example, the data dictionary is likely to store the name of the table creator, the date
of creation, access authorizations, and the number of columns.

• Indexes defined for each database table.

For each index, the DBMS stores at least the index name, the attributes used, the location,
specific index characteristics, and the creation date.

• Defined databases.

This information includes who created each database, when the database was created,
where the database is located, the DBA’s name, and so on

• End users and administrators of the database.

This information defines the users of the database.

• Programs that access the database.

This information includes screen formats, report formats, application programs, and SQL
queries.

• Access authorizations for all users of all databases.

This information defines who can manipulate which objects and what types of operations
can be performed.

• Relationships among data elements.


This information includes which elements are involved, whether

DATABASE VIEW

A database view displays one or more database records on the same page

A view can join information from several tables together.

Views have filters to determine which records they show.

Most users interact with the database using the database views.

You can use views to:

a) Focus on the data that interests them and on the tasks for which they are responsible.

Data that is not of interest to a user can be left out of the view.

b) Define frequently used joins, projections, and selections as views so that users do not have
to specify all the conditions and qualifications each time an operation is performed on that
data.

c) Display different data for different users, even when they are using the same data at the
same time.

This advantage is particularly important when users of many different interests and skill
levels share the same database.
Advantages:

 Provide additional level of table security by restricting access to a predetermined set of


rows or columns of a table.

 Hide Data complexity:

For example, a single view might be defined with a join, which is a collection of related
columns or rows in multiple tables.

However, the view hides the fact that this information actually originates from several
tables.

 Present data in different perspective:

Columns of views can be renamed without effecting the tables on which the views are
based.

Disadvantages:

 Rows available through a view are not sorted and are not ordered either.

 Cannot use DML operations on a View.

 When table is dropped view becomes inactive, it depends on the table objects.
 It affects performance, querying from view takes more time than directly querying from
the table.

DBMS ARCHITECTURE

 application layer performs load balancing, so you can have multiple clients.

DBMS – THREE SHCHEMA (LEVEL) ARCHITECTURE

The three schema architecture also called ANSI/SPARC (American National Standards Institute,
Standards Planning And Requirements Committee) Architecture, is an abstract design standard
for a database management system (DBMS), first proposed in 1975.

The ANSI-SPARC model however, never became a formal standard.

No mainstream DBMS systems are fully based on it.

They tend not to exhibit full physical independence, but the idea of logical data independence is
widely adopted.

Objectives of Three schema Architecture

The objective of the three-level architecture is to separate the user's view:

This separation is desirable for the following reasons:

1. Different users need different views of the same data according to their requirements.

2. The approach in which a particular user needs to see the data may change over time.

3. The users of the database should not worry about the physical implementation and internal
workings of the database such as data compression and encryption techniques, hashing,
optimization of the internal structures etc.

4. DBA should be able to change the conceptual structure of the database without affecting
the user's

5. Internal structure of the database should be unaffected by changes to physical aspects of


the storage.

The three levels

The Three Level Architecture has the aim of enabling users to access the same data but with a
personalized view of it.
The distancing of the internal level from the external level means that users do not need to know
how the data is physically stored in the database.

This level separation also allows the Database Administrator (DBA) to change the database
storage structures without affecting the users' views

The three levels are:

1. Internal Level

The internal level is concerned with how the database is physically represented on the
computer system.

The internal schema is also known as a physical level.

The internal level has an internal schema which describes the physical storage structure
of the database.

It describes how the data is actually stored in the database and on the computer hardware.

The internal level is generally is concerned with the following activities:

o Storage space allocations.

For Example: B-Trees, Hashing etc.


o Access paths.

For Example: Specification of primary and secondary keys, indexes, pointers and
sequencing.

o Data compression and encryption techniques.

o Optimization of internal structures.

o Representation of stored fields.

2. Conceptual Level

The conceptual level is a way of describing what data is stored within the whole database
and how the data is inter-related.

Conceptual level is also known as logical level.

In the conceptual level, internal details such as an implementation of the data structure are
hidden

Some important facts about this level are:

 Programmers and database administrators work at this level.

 Describes the structure of all users.

 Only DBA can define this level.

 Global view of database.

 Independent of hardware and software.

3. External Level

A user's view of the database.

It describes a part of the database that is relevant to a particular user.

An external level is also known as view level


It excludes irrelevant data as well as data which the user is not authorized to access.

At the external level, a database contains several schemas that sometimes called as
subschema.

The subschema is used to describe the different view of the database.

Database schemas

The overall plan or description of a database is called the database schema.

Schema gives the names of the entities and attributes and specifies the relationship among them.

It is a framework into which the values of the data items (or fields) are fitted.

The schema rarely changes.

But the values fitted into this format changes from instance to instance.

The data in the database at any particular point in time is called a database instance.

Therefore, many database instances can correspond to the same database schema.

There are three different types of schema corresponding to the three levels in the ANSI-SPARC
architecture:
 The External Schema

The external schemas describe the different external views of the data, and there may be
many external schemas for a given database.

 The Conceptual Schema

The conceptual schema describes all the data items and relationships between them,
together with integrity constraints.

There is only one conceptual schema per database.

 The Internal Schema

The internal schema at the lowest level contains definitions of the stored records, the
methods of representation, the data fields, and indexes.

There is only one internal schema per database.

Mapping between Views

The three levels of DBMS architecture don't exist independently of each other.

There must be correspondence between the three levels.

DBMS is responsible for correspondence between the three types of schema.

This correspondence is called Mapping.


There are basically two types of mapping in the database architecture:

 Conceptual/ Internal Mapping

The Conceptual/Internal mapping lies between the conceptual level and the internal level.

Its role is to define the correspondence between the records and fields of the conceptual
level and files and data structures of the internal level.

 External/ Conceptual Mapping

The external/Conceptual Mapping lies between the external level and the Conceptual
level.

Its role is to define the correspondence between a particular external and the conceptual
view.

DATA INDEPENDENCE OF DBMS

Data independence is the ability to modify the schema without affecting the programs and the
application to be rewritten.

Data is separated from the programs, so that the changes made to the data will not affect the
program execution and the application.

We know the main purpose of the three levels of data abstraction is to achieve data independence.

If the database changes and expands over time, it is very important that the changes in one level
should not affect the data at other levels of the database.
This would save time and cost required when changing the database

Importance of Data Independence

 Helps to improve the quality of the data

 Database system maintenance becomes affordable

 Enforcement of standards and improvement in database security

 You don't need to alter data structure in application programs

 Permit developers to focus on the general structure of the Database rather than worrying
about the internal implementation

 Easily make modifications in the physical level is needed to improve the performance of
the system.

Types of Data Independence

There are two levels of data independence based on three levels of abstraction.

Fig: Data Independence

1. Physical Data Independence

Physical Data Independence means changing the physical level without affecting the
logical level or conceptual level.

Using this property, we can change the storage device of the database without affecting
the logical schema.

The changes in the physical level may include changes using the following;

 A new storage device like magnetic tape, hard disk, etc.


 A new data structure for storage.

 A different data access method or using an alternative files organization technique.

 Changing the location of the database.

How is Physical Data Independence achieved?

Physical Data Independence is achieved by modifying the physical layer to logical layer
mapping (PL-LL mapping).

2. Logical data independence

Logical view of data is the user view of the data.

It presents data in the form that can be accessed by the end users.

Codd’s Rule of Logical Data Independence says that users should be able to manipulate
the Logical View of data without any information of its physical storage.

Software or the computer program is used to manipulate the logical view of the data.

Database administrator is the one who decides what information is to be kept in the
database and how to use the logical level of abstraction.

It provides the global view of Data.

It also describes what data is to be stored in the database along with the relationship.

The data independence provides the database in simple structure.

It is based on application domain entities to provide the functional requirement.

It provides abstraction of system functional requirements.

Static structure for the logical view is defined in the class object diagrams.

Users cannot manipulate the logical structure of the database.

The changes in the logical level may include −

 Change the data definition.

 Adding, deleting, or updating any new attribute, entity or relationship in the


database.

How is Logical Data Independence achieved?

Logical Data Independence is achieved by modifying the view layer to logical layer
mapping (VLLL mapping).
Difference between Physical and Logical Data Independence

The table below is the summary of comparison on logical and physical data independence

Logical Data Independence Physical Data Independence


Logical Data Independence is mainly Mainly concerned with the storage of the
concerned with the structure or changing data.
the data definition.
Compared to Physical independence it is Compared to Logical Independence it is
difficult to achieve logical data easy to achieve physical data independence.
independence.
You need to make changes in the A change in the physical level usually does
Application program if new fields are not need change at the Application program
added or deleted from the database. level.
Modification at the logical levels is Modifications made at the internal levels
significant whenever the logical structures may or may not be needed to improve the
of the database are changed. performance of the structure.
Concerned with conceptual schema Concerned with internal schema
Example: Add/Modify/Delete a new Example: change in compression
attribute techniques, hashing algorithms, storage
devices, etc.

OVERALL STRUCTURE DATABASE MANAGEMNT SYSTEM

A database management system (DBMS) is software that allows access to data stored in the
database and provides an easy and effective method of

i. Defining the information

ii. Storing the information

iii. Manipulating the information

iv. Protecting the information from system crashes and data theft

v. Differentiating access permissions for different users

The database system is divided into three components

 Query Manager

The query manager has three jobs to perform.

It runs user queries, gets data from the memory manager and shows the result to the user.

 Storage Manager

The storage manager controls the data's physical storage.


The Storage Manager performs CRUD (Create, Read, Update, and Delete) operations.

 Disk Storage

The data is saved efficiently and safely even after the system shutdown in the disk storage

DBMS is a crucial tool for effectively handling massive volumes of data.

When many people use the same data, it helps to manage, share, and maintain data integrity.

DBMS also helps to get rid of data errors and inconsistencies by making sure the data is real.

DBMS also protects sensitive data by ensuring only authorized individuals can access it.

We will go through the structure of DBMS and the key characteristics of its parts.
Structure of DBMS

The structure of DBMS is divided into three main components.

We now discuss each of these components in detail.


1. Query Manager

The primary role of the Query Manager is to interpret and execute queries given by the
user.

When a user or an application sends a question to the DBMS, the query manager first
translates that query into a low-level language, which the storage manager understands.

The storage manager then processes the query and provides the data the user or the
application requires.

The Query Manager then sends this data back to the user.

The query processor has the following components.

a) DDL Interpreter

DDL stands for Data Definition Language.

The DDL interpreter changes the DDL statements into a specific format to make
sense to the storage manager.

The DDL also ensures the consistency and validity of the database.

b) DML Compiler

DML stands for Data Manipulation Language.

The DML compiler changes DML commands like SELECT, INSERT, and
DELETE into low-level instructions so the storage manager can understand them.

The DML compiler also optimizes the queries to guarantee faster execution.

c) Embedded DML Pre-compiler

The Embedded DML pre-compiler processes the DML commands and


precompiles them into standard procedural calls, which can be executed within the
host programming language.

d) Query optimizer

This system component processes the SQL queries and determines the most
efficient execution plan for the queries.

The query optimizer considers all the possible ways to process a query.

It then chooses the most optimal route among them.

The query optimizer helps reduce the execution time and the resources required
for a query.

It also helps in providing a faster response to users.


2. Storage Manager

The storage manager is the part of the Database management system responsible for
controlling the data storage in the database.

The storage manager's main job is to handle the secondary storage's storage.

It also allows retrieval of data to offer access to the database.

The storage manager is responsible for creating, reading, updating, and deleting data in
the database.

It also ensures that the database maintains its consistency and integrity by denying any
unauthorized access.

The storage manager's main components are listed below.

a) File Manager

The file manager is responsible for creating, opening, and removing files in the
database.

b) Authorization and Integrity Manager:

As name implies its checks and allot the authority of the user and manages the
integrity constraints applied on database.

The access manager controls user access to databases and ensures no one is given
unlawful access.

c) Command Processor:

Execute the commands received from the compilers.

d) Query Optimizer:

The optimizer process analyzes SQL queries and finds the most efficient way to
access the data

This optimize the queries so that these queries can be processed in the minimum
resource utilization.

e) Transaction Manager

It manages the overall transactions performed in the complete system for smooth
and conflict free experience.

Also, it ensures that the database should be remain in consistent state because each
and every transaction affects the database.

f) Disk Space Manager


The DSM controls the allocation and deallocation of disc space and constantly
informs if the space is available.

g) Buffer Manager

This manages and controls that how the required data maybe fetched from storage
and transferred to the main memory.

Also manages the reverse operation as well.

h) Scheduler

As name implies it schedules all the tasks in the system including queries execution
and transaction management.

The scheduler process organizes the concurrent execution of SQL requests.

i) Lock manager.

This process manages all locks placed on database objects, including disk pages

j) Recovery Manager:

It is like backup and restore option.

All the operations and transactions and all the changes made are going through this
section so that it can keep track and record of all these activities.

So that whenever required it can provide the backup or roll back the operations.

3. Disk Storage

Disk Storage refers to physical storage devices like hard disks, which are used to store
data.

Disk storage provides a medium for storing data that remains stored even after the system
is shut down.

The Disk storage has mainly three components.

a) Data Dictionary

This database component provides metadata about the data components.

These components include tables, relations, and columns with their names,
descriptions, constraints, etc.

b) Data Files

All data in a database is stored in data files

These are stored on hard drives, solid-state drives, etc.


A typical enterprise database is normally composed of several data files.

A data file can contain rows from a single table or it can contain rows from many
different tables.

A database administrator determines the initial size of the data files that make up
the database; however, the data files can automatically expand as required.

Data files are generally grouped in file groups or table spaces.

c) Indices

In a database management system, indices are a type of data structure to provide


fast access to data based on specific columns of a table.

This help finds particular data entry rows which match the given search criteria.

d) Statistical Data:

As name implies it stores the statistical information about any data present in
database.

The term database statistics refers to a number of measurements about database


objects, such as number of processors used, processor speed, and temporary space
available.

Database statistics can be gathered manually by the DBA or automatically by the


DBMS.

Such statistics provide a snapshot of database characteristics.

The disk storage is optimized for storing data efficiently.

It also ensures fast retrieval to user queries.

The disk storage applies various techniques like partitioning, caching, indexing, data
compression, etc. to ensure these optimizations.

PROCEDURE FOR DATABASE ACCESS

What happens when user issues a request to DBMS?

The database management system is a bridge between the application program, (that determines
what data are needed and how they are processed), and the operating system of the computer,
which is responsible for placing data on the magnetic storage devices.

To retrieve data from the database, the following operations are performed internally:
1. A user issues an access request, using some application program or data manipulation
language.

A user’s request for data is received by the data manager, which determines the physical
record required.

The application program determines what data are needed and communicates the need to
the database management system.

The decision as to which physical record is needed may require some preliminary
consultation of the database and/or the data dictionary prior to the access of the actual data
itself.

2. The DBMS intercepts the request and interprets it.

Any access to the stored data is done by the data manager.

The DBMS inspects, in turn, the external schema, the external/conceptual mapping, the
conceptual schema, the conceptually internal mapping, and storage structure definition.

The data manager sends the request for a specific physical record to the file manager.

3. The data base management system instructs the operating system to locate and retrieve
the data from the specific location on the magnetic disk (or whatever device it is stored
on).

4. The file manager decides which physical block of secondary storage devices contains the
required record and sends the request for the appropriate block to the disk manager.

A block is a unit of physical input/output operations between primary and secondary


storage.

5. The disk manager retrieves the block and sends it to the file manager, which sends the
required record to the data manager.
A copy of the data is given to the application program for processing.

CLASSIFICATION OF DATABASE MANAGEMENT SYSTEM

Various types of databases may be created using a database management system.

Each database holds a specific set of data and is used for a specific purpose.

Various methods for classifying databases have been adopted based on the evolution and creative
uses of databases.

The best database for a specific organization depends on how the organization intends to use the
data.

Databases can be grouped into the following categories:

1. On the basis of the number of users:

The database system may be multi-user or single-user.

a) Single User

In this database, only one user is authorized at a time.

Users 2 and 3, for example, must hold on until user 1 has finished using the
database.

The term "desktop database" refers to a database that operates on a personal


computer.

b) Multi-User

This database allows several users to be authorized at the same time.

It supports multiple users concurrently.

Data can be both integrated and shared,

A database is integrated when the same information is not recorded in two places.

2. On the basis of the site location

This criterion is based on the number of sites over which the database is distributed.

a) Centralized Database System

This is a database that approves data from a single location.

The data from this database is stored in a single location, and users from all over
the world can access it.
This database contains application procedures that enable users to access data from
a strangely distant location.

End-user validation can be done using a variety of authentication procedures.

Similarly, the program procedures that keep track and document user data include
registration.

Centralized database can be depicted in Figure below.

Fig: Centralized DBMS

Advantages

o The data integrity is maximized as the whole database is stored at a single


physical location.

This means that it is easier to coordinate the data and it is as accurate and
consistent as possible.

o The data redundancy is minimal in the centralized database.

All the data is stored together and not scattered across different locations.

So, it is easier to make sure there is no redundant data available.

o Since all the data is in one place, there can be stronger security measures
around it. So, the centralized database is much more secure.

o Data is easily portable because it is stored at the same place.

o The centralized database is cheaper than other types of databases as it


requires less power and maintenance.

o All the information in the centralized database can be easily accessed from
the same location and at the same time.
Disadvantages

o Since all the data is at one location, it takes more time to search and access
it.

o If the network is slow, this process takes even more time.

o There is a lot of data access traffic for the centralized database.

o This may create a bottleneck situation.

o Since all the data is at the same location, if multiple users try to access it
simultaneously it creates a problem.

o This may reduce the efficiency of the system.

o If there are no database recovery measures in place and a system failure


occurs, then all the data in the database will be destroyed.

b) Cloud Database System

This is a database that is developed and maintained with the use of cloud data
services including Amazon AWS, or Microsoft Azure.

Cloud databases are those that have been optimized and developed for a virtual
environment.

The services provided by the third-party vendors outlined or stated the


performance measures (such the availability, data storage capacity) and the likes
for the database, but do not usually specify the basic infrastructure to implement
it.

The owner of the data does not have to know this or be concerned about what
hardware and software are being used to promote their database.

The capacity of the database to perform can be re-bargained with the cloud
provider regarding the requirements on the database change.

The organizations using this database usually purchase storage and processing
capacity for their data and applications.

When the demands on the database go up, further processing and storage
capabilities are also purchased as required.

Cloud computing has various advantages, including the ability to pay for storage
space and bandwidth on a pay-per-use basis, as well as scalability and high
availability when required.

This database also provides the library the enablement to assist operation
applications over a software-as-a-service platform.

The figure below depicts this.


Fig: Cloud database

c) Distributed database system

This is a database that accepts or confirm data shared across many different sites.

It is the polar opposite of centralized data.

Contributions from the common database, as well as information recorded by local


computers, are combined in the distributed database.

The data is dispersed around an organisation, rather than being stored in a single
location.

These sites are connected together with the help of communication links, which
enable them to easily access the distributed data.

Various parts of a database, as well as program processes that are replicated and
exchanged at different points in a network are stored in several locations.

There are two kinds of distributed databases.

o Homogeneous databases

These use the same basic or fundamental hardware and run on the same
operating systems and application procedures.

o Heterogeneous databases
These databases that have different operating systems, basic hardware, and
application procedures at different locations.

In distributes database system, data and the DBMS software are distributed over
several sites but connected to the single computer.

Advantages of Distributed Database System

o Modular Development

Implies that a system can be expanded to new locations or units by adding


new servers and data to the existing setup and connecting them to the
distributed system without interruption.

o Reliability

Offer greater reliability in contrast to centralized databases.

o Lower Communication Cost

Locally storing data reduces communication costs for data manipulation in


distributed databases.

o Better Response
Efficient data distribution in a distributed database system provides a faster
response when user requests are met locally.

Disadvantages of Distributed Database System

o Costly Software

Ensuring data transparency and coordination across multiple sites often


requires using expensive software in a distributed database system.

o Large Overhead

Many operations on multiple sites requires numerous calculations and


constant synchronization when database replication is used, causing a lot
of processing overhead.

o Data Integrity

A possible issue when using database replication is data integrity, which is


compromised by updating data at multiple sites.

o Improper Data Distribution

That means responsiveness can be reduced if data is not correctly


distributed across multiple sites.

3. Database Classification based on Type of data stored

a) General-purpose database:

The general-purpose database includes a wide range of data that can be used in a
variety of disciplines.

A census database with demographic data is an example.

Other examples are the Proquest, and LexisNexis databases with newspaper,
magazine, and journal articles on a range of topics.

b) Subject-specific database:

The subject-specific database is a combination of data focused on a single topic.

The information in such databases is mostly used for academic or research


purposes within a limited number of disciplines, such as CompuStar or CRSP
(Centre for Research Security Prices).

Another type of database is a geographic information system (GIS) database,


which stores geospatial and other related data, as well as medical databases, which
store anonymous medical history data.

4. Database classification based intended data usage


This criterion is based is based primarily on how it is used and the time-sensitivity of the
data it retrieves.

a) Operational database:

This database is designed basically or typically to support an organization’s daily


operations.

An OLTP database is also referred to as transactional database or a production


database.

Organizations are maintaining OLTP for storing “day-to-day transactions


information” i.e. basically using it for “running a business”. Example: SQL Server,
Oracle, MySQL, etc.

It is a database that processes online transactions.

With this database, data connected with the operations of an organisation.

Functional lines such as services, user relations, circulation, user service, and
others require this type of database.

b) Analytical database:

It is used for data analysis (or) data summarized (or) history of data of particular
business. Example: Datawarehouse.

The analytical database stores historical data and circulation metrics that are
mainly used to make decisions.

OLAP is a collection of tools that work intricately to provide a sophisticated data


analytic environment for retrieving, processing, and modeling data from a data
warehouse.

Recently, the use of this database has grown in popularity and has developed into
its discipline, namely business intelligence.

The word "business intelligence" refers to a system for collecting and analyzing
business data to create information useful in business decision-making
To build information, it needs extensive data messaging (data manipulation).

The end-user may use sophisticated techniques to perform advanced analysis of


operation data using this database.

There are two sections of this database.

These are a data center and an online scientific research front end.

A data warehouse is a type of data storage facilities that stores data in a version
that makes it simpler to make decisions about it.

The data center keeps historical data from operating databases as well as data from
other external sources.

5. Database classification based on the degree to which the data is structured

Another important way of classifying databases is through the degree to which the data is
structured.

Therefore, we have unstructured and structured data.

a) Unstructured data

It is the raw data in the form in which it was collected.

The unstructured data is usually in a version that does not lead to the processing
that yields information.

b) Structured data

Structured data is the one that arises due to the formatting of unstructured data
promote storage, use, and the creation of information.

The structured data format can be applied consequent on the types of processing
that one intends to follow on the data.

Unstructured data is not always ready for types of processing of the structured
data; structure data is always ready for other types of processing.

The data value 12345678, for example, may be a zip code, a sales value, or a
product code.

Since the value represents a zip code or a product code and is stored as text, it can
no longer be used for mathematical computation.

If this value represents a sales transaction, it must be formatted as a numeric value.

c) Semistructured

You also need to be aware of the semi-structured.


This is mostly the data you encounter and which has been processed to a level.

A look at a particular webpage will show that the data is represented in a


prearranged format to carry some information.

The storage and management of highly structured data is the focus of this database
category.

In other words, enterprises do not have to limit themselves to the use of structured
data.

Instead, they depend on unstructured and semi-structured data.

A new type of database called XML databases is now being used to handle
unstructured and semi-structured data storage and management requirements.

Extensible Markup Language, or XML, is a special language for indicating and


manipulating data elements in a textual format.

The storage and handling of semi-structured XML data is aided by an XML


database.

6. Based on the data model

This criterion is based is the data model used to represent the database

a) Hierarchical Database

Just as in any hierarchy, this database follows the progression of data being
categorized in ranks or levels.

As a result, two entities of data will be lower in rank and the commonality would
assume a higher rank.

Refer to the diagram below:

Another perspective advises visualizing the data being organized in a parent-child


relationship, which upon addition of multiple data elements would resemble a tree.

The child records are linked to the parent record using a field, and so the parent
record is allowed multiple child records.
However, vice versa is not possible.

Due such a structure, hierarchical databases are not easily salable; the addition of
data elements requires a lengthy traversal through the database.

b) Network Database

In Layman’s terms, a network database is a hierarchical database, but with a major


tweak.

The child records are given the freedom to associate with multiple parent records.

As a result, a network or net of database files linked with multiple threads is


observed.

Notice how the Student, Faculty, and Resources elements each have two-parent
records, which are Departments and Clubs.

Certainly, a complex framework, network databases are more capable of


representing two-directional relationships.

Also, conceptual simplicity favors the utilization of a simpler database


management language.

The disadvantage lies in the inability to alter the structure due to its complexity and
also in it being highly structurally dependent.

c) Object Oriented Database

Those familiar with the Object-Oriented Programming Paradigm would be able to


relate to this model of databases easily.

Information stored in a database is capable of being represented as an object which


response as an instance of the database model.

Therefore, the object can be referenced and called without any difficulty.

As a result, the workload on the database is substantially reduced.


In the chart above, we have different objects linked to one another using methods;
one can get the address of the Person (represented by the Person Object) using the
livesAt() method.

Furthermore, these objects have attributes which are in fact the data elements that
need to be defined in the database.

An example of such a model is the Berkeley DB software library which uses the
same conceptual background to deliver quick and highly efficient responses to
database queries from the embedded database.

d) Relational databases

Relational database technology provides the most efficient and flexible way to
access structured information.

Considered the most mature of all databases, these databases lead in the production
line along with their management systems.

In this database, every piece of information has a relationship with every other
piece of information.

This is on account of every data value in the database having a unique identity in
the form of a record.

Note that all data is tabulated in this model.

Therefore, every row of data in the database is linked with another row using a
primary key.

Similarly, every table is linked with another table using a foreign key.

Refer to the diagram below and notice how the concept of ‘Keys’ is used to link
two tables.
Due to this introduction of tables to organize data, it has become exceedingly
popular.

In consequence, they are widely integrated into Web-Ap interfaces to serve as ideal
repositories for user data.

What makes it further interesting is the ease in mastering it, since the language
used to interact with the database is simple (SQL in this case) and easy to
comprehend.

It is also worth being aware of the fact that in Relational databases, scaling and
traversing through data is quite a light-weighted task in comparison to Hierarchical
Databases.

e) No-SQL

A NoSQL originally referring to non SQL or non-relational is a database that


provides a mechanism for storage and retrieval of data.

This data is modeled in means other than the tabular relations used in relational
databases.

A NoSQL database includes simplicity of design, simpler horizontal scaling to


clusters of machines, and finer control over availability.

The data structures used by NoSQL databases are different from those used by
default in relational databases which makes some operations faster in NoSQL.

The suitability of a given NoSQL database depends on the problem it should solve.

Data structures used by NoSQL databases are sometimes also viewed as more
flexible than relational database tables.

MongoDB falls in the category of NoSQL document-based database.

Advantages of NoSQL
There are many advantages of working with NoSQL databases such as MongoDB
and Cassandra.

o The main advantages are high scalability and high availability.

Disadvantages of NoSQL

NoSQL has the following disadvantages.

o NoSQL is an open-source database.

o GUI is not available

o Backup is a weak point for some NoSQL databases like MongoDB.

o Large document size.

7. Classification based on DBMS Architecture

DBMS architecture has much to do with how the database is designed and laid out.

Databases are not always directly accessible by users or applications to store or access
data, so we must use various architectures to maintain them based on how users are
connected to the database.

DBMS architectures vary according to how users (clients) connect to the database servers
to carry out their requests.

DBMS architectures are classified based on how many layers are present in their structure,
i.e., tier-based classification.

An n-tier DBMS architecture consists of closely related but independent layers, levels,
and modules that are able to be independently modified, altered, changed, or replaced.

Modifications made to one layer of the architecture do not affect the other layers.

Database management systems (DBMS) can be categorized as single-tier, two-tier, or


multi-tier.

a) Single tier architecture

In the DBMS world, the one-tier Architecture is the simplest DBMS architecture,
in which the client, server, and database are all on the same machine.

This architecture puts the user directly in contact with the database itself, so the
user can create, modify, or delete data within the database.
The user sits directly on the database, without any intermediary layer.

Changes can be made easily by the user without the need to use a special tool.

Every change made by the client is immediately reflected in the database, and all
processing is carried out on one server.

In this way, we can perform the operation directly on the database and get a quick
response.

Considering that there is no real security in this architecture, this is only


recommended when creating a local application.

It is also referred to as the local database system.

b) Two tier architecture

In many ways, the concept of a two-tier DBMS architecture is similar to that of a


client-server architecture.

Client-side applications establish a connection with the server-side to communicate


with the database.

This interaction is performed through APIs (Application Programming Interfaces)


such as JDBC (Java Database Connectivity) or ODBC (Open Database
Connectivity).

User interfaces and application programs are executed on the client-side.

Query processing and transaction management are handled on the server side.

Client-side applications can access the database server directly via API calls, which
enables the application to remain independent of the database with respect to
design, programming, and operation.

It is more secure to have a two-tier architecture, as DBMS is not exposed directly


to end-users.

In two-tier architecture, the Database system is present at the server machine and
the DBMS application is present at the client machine, these two machines are
connected with each other through a reliable network as shown in the below
diagram.

Advantages:

o It is possible to use it simultaneously by multiple users, thus making it


suitable for use within an organization.

o Due to the database functionality being handled solely by the server, it has
a high processing capability.

o Direct connection and enhanced performance provide faster access to the


database.

o Having two independent layers makes it easier to maintain

c) Three tier architecture

The 3-tiered architecture is one of the most frequently used DBMS architectures.

Between the server (Database layer) and client (Presentation layer), an additional
layer known as the Application layer is added to reduce the server’s query
processing burden.

The modular architecture allows the independent development and maintenance of


functions, logic, data access, data storage, and user interfaces.

3-tier architectures separate the tiers based on users’ complexity and how they
interact with the data available in the database.
A 3-tier architecture has the following layers:

o Database (Data) Tier:

This tier consists of the database and the query processing languages.

Additionally, it includes relations defining the underlying data and


constraints.

o Application (Middle) Tier:

Here you will find the application server and the programs that will access
the database.

The application layer provides users with an abstract view of the database.

It does not reveal the existence of the database to the end-user.

Meanwhile, the database tier is also unaware of any users beyond the
application tier.

The application layer thus sits between the user and the database and serves
as a conduit (mediator).

o User (Presentation) Tier:

The end users interact with this layer, and they are not aware of the database
beyond it.

The application can provide multiple abstract views of the database at this
level.

The views are provided by the applications in the application tier.


Client and server are not directly connected; therefore, all requests from users are
handled by the Application Layer, i.e. the requests are validated and verified by
the Intermediate Layer before being forwarded to the server.

By eliminating direct client-server communication, the server is less burdened with


query processing, and overall DBMS security is enhanced as the client cannot
communicate directly with the server.

A three-tier DBMS architecture is therefore composed of an application layer that


ensures load balancing, query request accuracy, and security.

Usually, this type of architecture is used in cases where large web applications
must handle a great deal of traffic.

Advantages:

o Maintaining data integrity is of paramount importance.

The application layer performs checks on each client’s request to prevent


data corruption and wrong user requests.

o It is now more secure.

The client can’t interact directly with the server, thereby preventing
unauthorized access to data.

SUMMARY

In the unit, we discussed in a relatively informal manner the major components of a database
system.

We summarize the discussion below:

 A database-management system (DBMS) is a collection of interrelated data and a set of


programs to access those data.

This is a collection of related data with an implicit meaning and hence is a database.

 A datum – a unit of data – is a symbol or a set of symbols which is used to represent


something.

This relationship between symbols and what they represent is the essence of what we
mean by information.

 Knowledge refers to the practical use of information.

The collection of information stored in the database at a particular moment is called an


instance of the database.
The overall design of the database is called the database schema.

 The physical schema describes the database design at the physical level, while the logical
schema describes the database design at the logical level.

A database may also have several schemas at the view level, sometimes called subschemas
that describe different views of the database.

 Application programs are said to exhibit physical data independence if they do not depend
on the physical schema, and thus need not be rewritten if the physical schema changes.

Underlying the structure of a database is the data model: a collection of conceptual tools
for describing data, data relationships, data semantics, and consistency constraints.

 A database system provides a data definition language to specify the database schema and
a data manipulation language to express database queries and updates. One of the main
reasons for using DBMSs is to have central control of both the data and the programs that
access those data.

A person who has such central control over the system is called a database administrator
(DBA)

You might also like