Unit I-Database Management System
Unit I-Database Management System
UNIT I
INTRODUCTION
Introduction
Data is increasingly seen as a corporate asset that can be used to make better-informed business
decisions, with the goal of increasing revenue and profits and reducing costs.
To assess data’s monetary value, consider what is stored in a company database: data about
customers, suppliers, inventory, operations, and so on.
In effect, an organization is subject to a data-information-decision cycle; that is, the data user
applies intelligence to data to produce information that is the basis of knowledge used in decision
making.
To manage data as a corporate asset, managers must understand the value of information
For some companies, such as credit reporting agencies, their only product is information, and
their success is solely a function of information management.
Most organizations continually seek new ways to leverage their data resources to get greater
returns.
This leverage can take many forms, from data warehouses that support improved customer
relationships to tighter integration with customers and suppliers in support of the electronic supply
chain.
As organizations become more dependent on information, that information’s accuracy becomes
more critical.
Dirty data, or data that suffers from inaccuracies and inconsistencies, becomes an even greater
threat.
Lack of enforcement of integrity constraints, such as not null, uniqueness, and referential
integrity
Some causes of dirty data, such as improper implementation of constraints, can be addressed
within an individual database.
Some dirty data comes from the movement of data across systems, as in the creation of a data
warehouse.
• Interpretation and presentation of data in useful formats by transforming raw data into
information
• Distribution of data and information to the right people at the right time
• Data preservation and monitoring data usage for adequate periods of time
• Control over data duplication and use, both internally and externally
Regardless of the organization, the database’s predominant role is to support managerial decision
making at all levels in the organization while preserving data privacy and security.
An organization’s managerial structure might be divided into three levels: top-level management
makes strategic decisions, middle management makes tactical decisions, and operational
management makes daily working decisions.
Operational decisions are short term; for example, a manager might change the price of a product
to clear it from inventory.
Tactical decisions involve a longer time frame and affect larger-scale operations - or example,
changing the price of a product in response to competitive pressures.
Strategic decisions affect the long-term well-being of the company or even its survival - for
example, changing the pricing strategy across product lines to capture market share.
The DBMS must give each level of management a useful view of the data and support the required
level of decision making
Data:
All the details around us are termed as data, like name, phone no, address.
So, in simple words, we can say that it is a Raw Fact i.e. Characters, Numbers, special characters.
For example, Empid is data, Ename is data, Salary is data, DOJ is data, etc.
For example, from the above data, we cannot say whether Warner is the name of an employee, or
name of a customer, or the name of a Product because Warner is simply data.
Information:
So, in simple words, we can say that processing the data or raw facts is called information.
For example, from the information, we can say that Warner is the name of an Employee.
Data Model
A model is a representation of ‘real world’ objects and events and their associations to make the
data understandable.
It can be defined as an integrated collection of concepts for describing and manipulating data,
relationships between data, and constraints on the data in an organization.
• A structural part
• A manipulative part,
Defines the types of operation that are allowed on the data (this includes the operations
that are used for updating or retrieving data from the database and for changing the
structure of the database).
• Constraint part
Possibly a set of integrity rules, which ensures that the data is accurate.
Database Server
The database server holds the Database Management System (DBMS) and the databases.
The client is the application, which is used to interface with the DBMS, while database server is
a DBMS.
Central DBMS functions on the server are referred to as the back-end functions, whereas the
application programs on the client computer as front-end programs.
Hence, a client program connects to the Database server and sends requests (queries) using the
ODBC Application Programming Interface (API).
The-server processes the queries and sends back the results of queries to the client program, which
are processed by the client computer.
Database Objects
Data Quality
Data quality is a comprehensive approach to ensuring the accuracy, validity, and timeliness of
data.
This comprehensive approach is important because data quality involves more than just cleaning
dirty data; it also focuses on preventing future inaccuracies and building user confidence in the
data.
Large-scale data quality initiatives tend to be complex and expensive projects, so the alignment
of these initiatives with business goals is a must, as is buy-in from top management.
While data quality efforts vary greatly from one organization to another, most involve the
following:
DATA MANAGENENT
Data management is the process of ingesting, storing, organizing and maintaining the data created
and collected by an organization.
The data management process includes a wide range of tasks and procedures, such as:
Integrating different types of data from disparate sources, including structured and
unstructured data
Data management has also grown in importance as businesses are subjected to an increasing
number of regulatory compliance requirements, including data privacy and protection laws.
The separate disciplines that are part of the overall data management process cover a series of
steps, from data processing and storage to governance of how data is formatted and used in
operational and analytical systems.
The key disciplines of the data management process are explained below and shown in the figure;
Fig: Key parts of the data management process
Data Architecture
A data architecture is often the first step, particularly in large organizations with lots of
data to manage.
A data architecture provides a blueprint for managing data and deploying databases and
other data platforms, including specific technologies to fit individual applications.
Data modeling:
Data are being designed through the different models, the relationship between the data
and other details are portrayed through this concept.
Data Mining:
Data integration:
It combines different data from different sources and also analyzes those data for the
processing of information.
Data governance:
Data handling policies are made under this concept; it also confirms data fetching
consistency and other related issues.
This is a manual record keeping systems in which human being manage the whole
database without the support of computers.
Disadvantages
o There is no security
What will you do when the page allotted for names beginning with “H” is finished?
One option is to buy a new storage of larger size and to transfer the previous ones onto
the new one, start with the next address to be stored.
The second option is to use some blanks pages at the end of address book for storing
remaining address of the persons whose names starts with “H”.
The first option is obviously very tiring and time consuming and thinks of the second
option, to find a person having a name starting with ‘H’ you will have to search it in two
different places, which is quite cumbersome procedure
The file-based approach' refers to the situation where data is stored in one or more separate
computer files defined and managed by different application programs.
The data can be stored in files with help of the Operating System
Computer programs access the stored files to perform the various tasks required by the
business.
The following diagram shows how different applications will each have their own copy
of the files they need in order to carry out the activities for which they are responsible:
Limitations
Data duplication:
If the same data is to be accessed by different programs, then each program must
store its own copy of the same data.
Data inconsistency:
If the data is kept in different files, there could be problems when an item of data
needs updating, as it will need to be updated in all the relevant files.
If this is not done, the data will be inconsistent, and this could lead to errors.
One approach to solving the problem of each application having its own set of files is to
share files between different applications.
This will alleviate the problem of duplication and inconsistent data between different
applications, and is illustrated in the diagram below
:
The introduction of shared files solves the problem of duplication and inconsistent data,
but other problems may emerge, including:
File incompatibility:
It means that the file cannot be used or opened by the software or device that you
are trying to use it with.
If departments have to share files, the file structure that suits one department might
not suit another.
For example, data might need to be sorted in a different sequence for different
applications (for instance, customer details could be stored in alphabetical order,
or numerical order, or ascending or descending order of customer number).
The file will still need to contain the additional information to support the
application that requires it.
If the structure of the data file needs to be changed in some way (for example, to
reflect a change in currency), this alteration will need to be reflected in all
application programs that use that data file.
While a data file is being processed by one application, the file will not be available
for other applications or for ad hoc queries.
This is because, if more than one application is allowed to alter data in a file at one
time, serious problems can arise in ensuring that the updates made by each
application do not clash with one another.
File-based systems avoid these problems by not allowing more than one
application to access a file at one time.
In order to remove all limitations of the File Based Approach, a new approach was
required that is more effective known as Database approach.
The database approach provides facilities for querying, data security and integrity, and
allows simultaneous access to data by a number of different users.
One of the benefits of the database approach is that the problem of physical data
dependence is resolved.
This means that the underlying structure of a data file can be changed without the
application programs needing amendment.
DATABASE
The database is a computer-based record-keeping system whose overall purpose is to record and
maintain information.
The database is a single, large repository of data that can be used simultaneously by many users.
It is a collection of interrelated information by using the database we can store, modify, select and
delete data from the database in a secure manner.
Together, the data and the DBMS, along with the applications that are associated with them, are
referred to as a database system, often shortened to just database.
Most databases use structured query language (SQL) for writing and querying data.
Evolution of the database
Databases have evolved dramatically since their inception in the early 1960s.
Navigational databases such as the hierarchical database (which relied on a tree-like model and
allowed only a one-to-many relationship), and the network database (a more flexible model that
allowed multiple relationships), were the original systems used to store and manipulate data.
In the 1980s, relational databases became popular, followed by object-oriented databases in the
1990s.
More recently, NoSQL databases came about as a response to the growth of the internet and the
need for faster speed and processing of unstructured data.
Today, cloud databases and self-driving databases are breaking new ground when it comes to how
data is collected, stored, managed, and utilized.
Databases and spreadsheets (such as Microsoft Excel) are both convenient ways to store
information.
Spreadsheets were originally designed for one user, and their characteristics reflect that.
They’re great for a single user or small number of users who don’t need to do a lot of incredibly
complicated data manipulation.
Databases, on the other hand, are designed to hold much larger collections of organized.
Databases allow multiple users at the same time to quickly and securely access and query the data
using highly complex logic and language.
Databases are used in most modern applications, whether the database is on your personal phone,
computer, or the internet.
An operational database system will store much of the data an application needs to function,
keeping the data organized and allowing users to access the data.
If you were building an ecommerce app, some of the data you might access and store in your
operational database system includes:
Customer data, like usernames, email addresses, and preferences.
Relationship data, like the locations of stores with a specific product in stock.
Columns.
Columns are similar to fields, that is, individual items of data that we wish to store.
A Student’ Roll Number, Name, Address etc. are all examples of columns.
They are also similar to the columns found in spreadsheets (the A, B, C etc. along the top).
Rows.
Rows are similar to records as they contain data of multiple columns (like the 1, 2, 3 etc.
in a spreadsheet).
This makes reading data much more efficient – you fetch what you want.
Tables.
For example, you may have a table that stores details of customers’ names and addresses.
Another table would be used to store details of parts and yet another would be used for
supplier’s names and addresses.
It is the tables that make up the entire database and it is important that we do not duplicate
data at all.
Characteristics of database
As a result, multiuser databases typically use concurrency control to enable many users to
having access to similar data items at the consecutively while ensuring that the data is
accurate in terms of data integrity.
Redundancy still exists in some situations, but it is regulated and reduced to the bare
minimum in order to increase device efficiency.
c) Self-descriptive nature:
A database system includes not only the database itself, but also data structure
explanations and flaws.
When the need arises, DBMS software or database users use this piece of knowledge.
The line of demarcation distinguishes a framework from standard file-based system data
specification as part of application programs.
As a result, users do not need to be aware of how and where the data they're talking about
is stored.
In a file-based environment, the data file structure is often specified in the application
programs.
As a consequence, if a user wants to change the structure, all of the programs that use that
file must also be changed.
If not, the data structure, not the programs, is saved in the machine catalogue.
For instance, a user table contains fields that present data about that user.
i) Databases are created, develop, build, and populated with data for a certain purpose.
These terms CRUD describe the four essential operations for creating and managing persistent
data elements, mainly in relational and NoSQL databases.
CREATE
In RDBMS, a database table row is referred to as a record, while columns are called
attributes or fields.
The CREATE operation adds one or more new records with distinct field values in a table.
If the NoSQL database is document-oriented, then a new document (for example, a JSON
formatted document with its attributes) is added to the collection, which is the equivalent
of an RDBMS table.
Similarly, in NoSQL databases like DynamoDB, the CREATE operation adds an item
(which is equivalent to a record) to a table.
READ
READ returns records (or documents or items) from a database table (or collection or
bucket) based on some search criteria.
The READ operation can return all records and some or all fields.
UPDATE
For example, this can be the change of address in a customer database or price change in
a product database.
Similar to READ, UPDATEs can be applied across all records or only a few, based on
criteria.
An UPDATE operation can modify and persist changes to a single field or to multiple
fields of the record.
If multiple fields are to be updated, the database system ensures they are all updated or
not at all.
Some big data systems don’t implement UPDATE but allow only a timestamped
CREATE operation, adding a new version of the row each time.
DELETE
DELETE operations allow the user to remove records from the database.
A hard delete removes the record altogether, while a soft delete flags the record but leaves
it in place.
For example, this is important in payroll where employment records need to be maintained
even after an employee has left the company.
It is the software that is used to manage and maintain data in the database.
By using DBMS, we can create new databases, new tables, insert, update, delete and select the
data from the database.
MySQL,
Microsoft Access,
FileMaker Pro,
dBASE.
When dealing with huge amount of data, there are two things that require optimization:
Storage:
According to the principles of database systems, the data is stored in such a way that it
acquires lot less space as the redundant data (duplicate data) has been removed before
storage.
Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account
and another is salary account.
Let’s say bank stores saving account data at one place (called tables) and salary account
data at another place, in that case if the customer information such as customer name,
address etc. are stored at both places then this is just a wastage of storage
(redundancy/duplication of data).
To organize the data in a better way the information should be stored at one place and
both the accounts should be linked to that information somehow.
Along with storing the data in an optimized and systematic manner, it is also important
that we retrieve the data quickly when needed.
DBMS Applications
Telecom:
There is a database to keep track of the data regarding calls made, network usage, customer
details etc.
Without the database systems it is hard to maintain that huge amount of data that keeps
updating every millisecond.
Industry:
For example, distribution centre should keep track of the product units that supplied into
the centre as well as the products that got delivered out from the distribution centre on
each day; this is where DBMS comes into picture.
Banking System:
For storing customer information, tracking day to day credit and debit transactions,
generating bank statements etc.
All this work has been done with the help of Database management systems.
Sales:
To travel through airlines, we make early reservations, this reservation information along
with flight schedule is stored in database.
Education sector:
Database systems are frequently used in schools and colleges to store and retrieve the data
regarding student details, staff details, course details, exam details, payroll data,
attendance details, fees details etc.
There is a lot of inter-related data that needs to be stored and retrieved in an efficient
manner.
Online shopping:
You must be aware that online shopping websites such as Amazon, Flipkart etc.
These sites store the product information, your addresses and preferences, credit details
and provide you the relevant list of products based on your query.
The list of the mentioned very few applications is never going to end if we start mentioning
all the DBMS applications.
DBMS Functions
A DBMS performs several important functions that guarantee integrity and consistency of data
in the database.
There are the following important functions and services provided by a DBMS:
The internal schema defines how the data should be stored by the storage management
mechanism and the storage manager interfaces with the operating system to access the
physical storage.
A DBMS furnishes users with the ability to retrieve, update and delete existing data in the
database.
The DBMS accepts the data definitions such as external schema, the conceptual schema,
the internal schema, and all the associated mappings in source form.
Data Dictionary/System Catalog Management:
The DBMS provides a data dictionary or system catalog function in which descriptions of
data items are stored and which is accessible to users.
The end-user’s requests for database access are transmitted to DBMS in the form of
communication messages.
The DBMS protects the database against unauthorized access, either international or
accidental.
It furnishes mechanism to ensure that only authorized users have access the database.
The DBMS provides mechanisms for backing up data periodically and recovering from
different types of failures.
Since DBMSs support sharing of data among multiple users, they must provide a
mechanism for managing concurrent access to the database.
DBMSs ensure that the database kept in consistent state and that integrity of the data is
preserved.
Transaction Management:
Therefore, a DBMS must provide a mechanism to ensure either that all the updates
corresponding to a given transaction are made or that none of them is made.
The DBMS creates the complex structures that allow multiple users access to the data.
In order to provide data integrity and data consistency the DBMS uses sophisticated
algorithms to ensure that multiple users can access the DB concurrently without
compromising the integrity of the DB
Advantages of a DBMS
The availability of a DBMS between the database and the end-users application provides
numerous benefits.
For example, a DBMS enables data in a database to be distributed by several users or applications.
Data is critical to the substance out of which data is derived, collected or procured, so a good data
management system is needed.
Controlling redundancy:
Storing identical data in a single file in a database management system (DBMS) can meet
the needs of all three users.
As a result, maintaining several copies of the same data may be constrained in terms of
redundancy.
However, in practice, there can be times when we need to add a small amount of
redundancy to the database for performance purposes.
Inconsistency in data occurs when various data versions appear at various places.
Improvements in decision-making:
The generation of higher-quality data, which can be used to make informed decisions, is
made possible by well-managed data and improved data access.
The quality of the information used is determined primarily by the quality of the raw data.
Data quality is an integrated approach to ensuring data accuracy, accuracy and validity.
End users can make swift, educated decisions based on data availability and resources that
turn data into usable knowledge, which can mean the difference between success and
failure.
Economical:
Economy, in general, refers to the cost of a group of operations that is less than the amount
of the costs of individual efforts.
Besides, since many people share a database, any database modification of upgrade would
help everybody.
The DBMS promotes the creation of an environment where users have unlimited access
to numerous well-managed data.
The database system ensures a balance between the competing needs of different data
users.
The database system considers the needs of both individual users and the organization as
a whole
The higher the potential for data access, the greater the data protection risks.
Since the Database Administrator (DBA) has power over operational data, authorization
protocols may be set up that allow only registered users to have access to it.
Various users may have various types of access to the same data with the assistance of
DBA.
In a file-oriented approach, data handled with different files from different users is not
portable or sensitive.
For example, if a file in a file-oriented system is arranged in an alphabetical order by
author, the system will not respond to queries to view the list in different orders, such as
alphabetical order by title, topic, publisher, or date.
If the same information is stored in a structured database, a user may get a response from
the database in one of the ways described above.
Besides, due to the DBMS' flexibility, programmers can create new programs in response
to unique user requests.
A query is a request to the DBMS to access the data in a particular way, such as reading
or changing it.
A query is simply a question, and an ad hoc query is a question that is posed on the spur
of the moment.
The DBMS responds to the application with a response (referred to as the query result).
When dealing with voluminous sales data, for example, end users may want fast answers
to questions (ad hoc queries).
Disadvantages of DBMS
You should note that, as there advantages of databases, so also there are disadvantages.
Due to the DBMS's complexity, data recovery in the event of a catastrophe is much more
difficult and complicated compared to what obtained in a file-oriented system.
Complexity:
There are choices to make when developing and implementing a new application using a
DBMS.
Cost:
Due to the project's size and complexity of DBMS, more hardware resources would be
needed.
Users can experience a substantial drop in performance if the system's hardware resources
aren't upgraded when a DBMS is purchased.
Many of the data processing tools in the information system are concentrated in the
database.
Any hardware/software failure will have much more serious consequences than in a non-
database setting.
In a database setting, all nodes will fail; however, in a file-oriented system, only one node
will fail.
Size:
A DBMS must be a large program to accommodate all of the complex applications that it
must provide to users, consuming huge megabytes of storage space including a
considerable amount of internal memory.
This is an assemblage of elements that describe and regulate the gathering, storage, and
management perspective.
Computers, input and output devices, storage devices, another physical, electronic devices
make up the hardware.
2. Software:
This is a set of programs for managing and monitoring the database as a whole.
Database software, operating system, network software that allows users to share data,
and application programs that enable users to access data in the database is all included
Between the physical databases itself (i.e. the data as actually stored) and the users of the
system is a layer of software, usually called the Database Management System or DBMS..
One general function provided by the DBMS is thus the shielding of database users from
complex hardware-level detail.
3. Data:
Unless it is organized, data can be simple and disorganized at the same time.
It is the most important component of DBMS environment from the end users point of
view.
Data acts as a bridge between the machine components and the user components.
The database should contain all the data needed by the organization.
4. Procedures:
Procedures are a set of guidelines and regulations that will help you use the database
management system more effectively.
It is the process of creating and running a database using documented methods in order to
direct the users who run and manage it
vi Change the structure of a table, reorganize the database across multiple disks,
improve performance, or archive data to secondary storage.
5. Users:
There are a number of users who can access or retrieve data on demand using the
applications and interfaces provided by the DBMS.
The users of a database system can be classified in the following groups, depending on
their degrees of expertise or the mode of their interactions with the DBMS.
a) Naive Users:
Naive Users are those users who need not to be aware of the presence of the
database system or any other system supporting their usage.
Naive users are end users of the database who work through a menu driven
application program, where the type and range of response is always indicated to
the user.
b) Online Users:
Online users are those who may communicate with the database directly via an
online terminal or indirectly via a user interface and application program.
These users are aware of the presence of the database system and may have
acquired a certain amount of expertise with limited interaction permitted with a
database.
c) Sophisticated Users:
d) Specialized Users:
Such users are those, who write specialized database application that do not fit into
the normal data-processing framework.
For example: Computer-aided design systems, knowledge base and expert system,
systems that store data with complex data types (for example, graphics data and
audio data).
e) Application Programmers:
f) Database Administrator:
The database administrator (DBA) is the person or group responsible for handling
the database and making sure that the data is stable and has integrity.
The size and role of the DBA function varies from company to company, as does
its placement within the organizational structure.
There is no standard for how the DBA function fits in an organization’s structure,
partly because the function itself is probably the most dynamic of any in an
organization.
DBA operations are commonly defined and divided according to the phases of the
Database Life Cycle (DBLC).
Keep in mind that a company might have several incompatible DBMSs installed
to support different operations.
In such an environment, the company might have one DBA assigned for each
DBMS.
The database administration tools cover the entire spectrum of data administration
tasks, from selection to inception, deployment, migration, and day-to-day
operations.
For example, you can find sophisticated data administration tools for:
o Database monitoring
o Database load testing
DATA DICTIONARY
A data dictionary is defined as “a DBMS component that stores the definition of data
characteristics and relationships.”
You may recall that such “data about data” are called metadata.
The DBMS data dictionary provides the DBMS with its self-describing characteristic.
In effect, the data dictionary resembles an x-ray of the company’s entire data set, and it is a crucial
element in data administration.
For example, all relational DBMSs include a built-in data dictionary or system catalog that is
frequently accessed and updated by the RDBMS.
Other DBMSs, especially older types, do not have a built-in data dictionary; instead, the DBA
may use third-party standalone systems.
An active data dictionary is automatically updated by the DBMS with every data base access to
keep its access information up to date.
A passive data dictionary is not updated automatically and usually requires running a batch
process.
Data dictionary access information is normally used by the DBMS for query optimization.
The data dictionary’s main function is to store the description of all objects that interact with the
database.
Integrated data dictionaries tend to limit their metadata to the data managed by the DBMS.
Standalone data dictionary systems are usually more flexible and allow the DBA to describe and
manage all of the organization’s data, whether they are computerized or not.
Whatever the data dictionary’s format, it provides database designers and end users with a much-
improved ability to communicate.
In addition, the data dictionary is the tool that helps the DBA resolve data conflicts.
Although there is no standard format for the information stored in the data dictionary, several
features are common.
For example, the data dictionary typically stores descriptions of the following:
Specifically, the data dictionary stores element names, data types, display format, internal
storage format, and validation rules.
The data dictionary explains where an element is used, who used it, and so on.
For example, the data dictionary is likely to store the name of the table creator, the date
of creation, access authorizations, and the number of columns.
For each index, the DBMS stores at least the index name, the attributes used, the location,
specific index characteristics, and the creation date.
• Defined databases.
This information includes who created each database, when the database was created,
where the database is located, the DBA’s name, and so on
This information includes screen formats, report formats, application programs, and SQL
queries.
This information defines who can manipulate which objects and what types of operations
can be performed.
DATABASE VIEW
A database view displays one or more database records on the same page
Most users interact with the database using the database views.
a) Focus on the data that interests them and on the tasks for which they are responsible.
Data that is not of interest to a user can be left out of the view.
b) Define frequently used joins, projections, and selections as views so that users do not have
to specify all the conditions and qualifications each time an operation is performed on that
data.
c) Display different data for different users, even when they are using the same data at the
same time.
This advantage is particularly important when users of many different interests and skill
levels share the same database.
Advantages:
For example, a single view might be defined with a join, which is a collection of related
columns or rows in multiple tables.
However, the view hides the fact that this information actually originates from several
tables.
Columns of views can be renamed without effecting the tables on which the views are
based.
Disadvantages:
Rows available through a view are not sorted and are not ordered either.
When table is dropped view becomes inactive, it depends on the table objects.
It affects performance, querying from view takes more time than directly querying from
the table.
DBMS ARCHITECTURE
application layer performs load balancing, so you can have multiple clients.
The three schema architecture also called ANSI/SPARC (American National Standards Institute,
Standards Planning And Requirements Committee) Architecture, is an abstract design standard
for a database management system (DBMS), first proposed in 1975.
They tend not to exhibit full physical independence, but the idea of logical data independence is
widely adopted.
1. Different users need different views of the same data according to their requirements.
2. The approach in which a particular user needs to see the data may change over time.
3. The users of the database should not worry about the physical implementation and internal
workings of the database such as data compression and encryption techniques, hashing,
optimization of the internal structures etc.
4. DBA should be able to change the conceptual structure of the database without affecting
the user's
The Three Level Architecture has the aim of enabling users to access the same data but with a
personalized view of it.
The distancing of the internal level from the external level means that users do not need to know
how the data is physically stored in the database.
This level separation also allows the Database Administrator (DBA) to change the database
storage structures without affecting the users' views
1. Internal Level
The internal level is concerned with how the database is physically represented on the
computer system.
The internal level has an internal schema which describes the physical storage structure
of the database.
It describes how the data is actually stored in the database and on the computer hardware.
For Example: Specification of primary and secondary keys, indexes, pointers and
sequencing.
2. Conceptual Level
The conceptual level is a way of describing what data is stored within the whole database
and how the data is inter-related.
In the conceptual level, internal details such as an implementation of the data structure are
hidden
3. External Level
At the external level, a database contains several schemas that sometimes called as
subschema.
Database schemas
Schema gives the names of the entities and attributes and specifies the relationship among them.
It is a framework into which the values of the data items (or fields) are fitted.
But the values fitted into this format changes from instance to instance.
The data in the database at any particular point in time is called a database instance.
Therefore, many database instances can correspond to the same database schema.
There are three different types of schema corresponding to the three levels in the ANSI-SPARC
architecture:
The External Schema
The external schemas describe the different external views of the data, and there may be
many external schemas for a given database.
The conceptual schema describes all the data items and relationships between them,
together with integrity constraints.
The internal schema at the lowest level contains definitions of the stored records, the
methods of representation, the data fields, and indexes.
The three levels of DBMS architecture don't exist independently of each other.
The Conceptual/Internal mapping lies between the conceptual level and the internal level.
Its role is to define the correspondence between the records and fields of the conceptual
level and files and data structures of the internal level.
The external/Conceptual Mapping lies between the external level and the Conceptual
level.
Its role is to define the correspondence between a particular external and the conceptual
view.
Data independence is the ability to modify the schema without affecting the programs and the
application to be rewritten.
Data is separated from the programs, so that the changes made to the data will not affect the
program execution and the application.
We know the main purpose of the three levels of data abstraction is to achieve data independence.
If the database changes and expands over time, it is very important that the changes in one level
should not affect the data at other levels of the database.
This would save time and cost required when changing the database
Permit developers to focus on the general structure of the Database rather than worrying
about the internal implementation
Easily make modifications in the physical level is needed to improve the performance of
the system.
There are two levels of data independence based on three levels of abstraction.
Physical Data Independence means changing the physical level without affecting the
logical level or conceptual level.
Using this property, we can change the storage device of the database without affecting
the logical schema.
The changes in the physical level may include changes using the following;
Physical Data Independence is achieved by modifying the physical layer to logical layer
mapping (PL-LL mapping).
It presents data in the form that can be accessed by the end users.
Codd’s Rule of Logical Data Independence says that users should be able to manipulate
the Logical View of data without any information of its physical storage.
Software or the computer program is used to manipulate the logical view of the data.
Database administrator is the one who decides what information is to be kept in the
database and how to use the logical level of abstraction.
It also describes what data is to be stored in the database along with the relationship.
Static structure for the logical view is defined in the class object diagrams.
Logical Data Independence is achieved by modifying the view layer to logical layer
mapping (VLLL mapping).
Difference between Physical and Logical Data Independence
The table below is the summary of comparison on logical and physical data independence
A database management system (DBMS) is software that allows access to data stored in the
database and provides an easy and effective method of
iv. Protecting the information from system crashes and data theft
Query Manager
It runs user queries, gets data from the memory manager and shows the result to the user.
Storage Manager
Disk Storage
The data is saved efficiently and safely even after the system shutdown in the disk storage
When many people use the same data, it helps to manage, share, and maintain data integrity.
DBMS also helps to get rid of data errors and inconsistencies by making sure the data is real.
DBMS also protects sensitive data by ensuring only authorized individuals can access it.
We will go through the structure of DBMS and the key characteristics of its parts.
Structure of DBMS
The primary role of the Query Manager is to interpret and execute queries given by the
user.
When a user or an application sends a question to the DBMS, the query manager first
translates that query into a low-level language, which the storage manager understands.
The storage manager then processes the query and provides the data the user or the
application requires.
The Query Manager then sends this data back to the user.
a) DDL Interpreter
The DDL interpreter changes the DDL statements into a specific format to make
sense to the storage manager.
The DDL also ensures the consistency and validity of the database.
b) DML Compiler
The DML compiler changes DML commands like SELECT, INSERT, and
DELETE into low-level instructions so the storage manager can understand them.
The DML compiler also optimizes the queries to guarantee faster execution.
d) Query optimizer
This system component processes the SQL queries and determines the most
efficient execution plan for the queries.
The query optimizer considers all the possible ways to process a query.
The query optimizer helps reduce the execution time and the resources required
for a query.
The storage manager is the part of the Database management system responsible for
controlling the data storage in the database.
The storage manager's main job is to handle the secondary storage's storage.
The storage manager is responsible for creating, reading, updating, and deleting data in
the database.
It also ensures that the database maintains its consistency and integrity by denying any
unauthorized access.
a) File Manager
The file manager is responsible for creating, opening, and removing files in the
database.
As name implies its checks and allot the authority of the user and manages the
integrity constraints applied on database.
The access manager controls user access to databases and ensures no one is given
unlawful access.
c) Command Processor:
d) Query Optimizer:
The optimizer process analyzes SQL queries and finds the most efficient way to
access the data
This optimize the queries so that these queries can be processed in the minimum
resource utilization.
e) Transaction Manager
It manages the overall transactions performed in the complete system for smooth
and conflict free experience.
Also, it ensures that the database should be remain in consistent state because each
and every transaction affects the database.
g) Buffer Manager
This manages and controls that how the required data maybe fetched from storage
and transferred to the main memory.
h) Scheduler
As name implies it schedules all the tasks in the system including queries execution
and transaction management.
i) Lock manager.
This process manages all locks placed on database objects, including disk pages
j) Recovery Manager:
All the operations and transactions and all the changes made are going through this
section so that it can keep track and record of all these activities.
So that whenever required it can provide the backup or roll back the operations.
3. Disk Storage
Disk Storage refers to physical storage devices like hard disks, which are used to store
data.
Disk storage provides a medium for storing data that remains stored even after the system
is shut down.
a) Data Dictionary
These components include tables, relations, and columns with their names,
descriptions, constraints, etc.
b) Data Files
A data file can contain rows from a single table or it can contain rows from many
different tables.
A database administrator determines the initial size of the data files that make up
the database; however, the data files can automatically expand as required.
c) Indices
This help finds particular data entry rows which match the given search criteria.
d) Statistical Data:
As name implies it stores the statistical information about any data present in
database.
The disk storage applies various techniques like partitioning, caching, indexing, data
compression, etc. to ensure these optimizations.
The database management system is a bridge between the application program, (that determines
what data are needed and how they are processed), and the operating system of the computer,
which is responsible for placing data on the magnetic storage devices.
To retrieve data from the database, the following operations are performed internally:
1. A user issues an access request, using some application program or data manipulation
language.
A user’s request for data is received by the data manager, which determines the physical
record required.
The application program determines what data are needed and communicates the need to
the database management system.
The decision as to which physical record is needed may require some preliminary
consultation of the database and/or the data dictionary prior to the access of the actual data
itself.
The DBMS inspects, in turn, the external schema, the external/conceptual mapping, the
conceptual schema, the conceptually internal mapping, and storage structure definition.
The data manager sends the request for a specific physical record to the file manager.
3. The data base management system instructs the operating system to locate and retrieve
the data from the specific location on the magnetic disk (or whatever device it is stored
on).
4. The file manager decides which physical block of secondary storage devices contains the
required record and sends the request for the appropriate block to the disk manager.
5. The disk manager retrieves the block and sends it to the file manager, which sends the
required record to the data manager.
A copy of the data is given to the application program for processing.
Each database holds a specific set of data and is used for a specific purpose.
Various methods for classifying databases have been adopted based on the evolution and creative
uses of databases.
The best database for a specific organization depends on how the organization intends to use the
data.
a) Single User
Users 2 and 3, for example, must hold on until user 1 has finished using the
database.
b) Multi-User
A database is integrated when the same information is not recorded in two places.
This criterion is based on the number of sites over which the database is distributed.
The data from this database is stored in a single location, and users from all over
the world can access it.
This database contains application procedures that enable users to access data from
a strangely distant location.
Similarly, the program procedures that keep track and document user data include
registration.
Advantages
This means that it is easier to coordinate the data and it is as accurate and
consistent as possible.
All the data is stored together and not scattered across different locations.
o Since all the data is in one place, there can be stronger security measures
around it. So, the centralized database is much more secure.
o All the information in the centralized database can be easily accessed from
the same location and at the same time.
Disadvantages
o Since all the data is at one location, it takes more time to search and access
it.
o Since all the data is at the same location, if multiple users try to access it
simultaneously it creates a problem.
This is a database that is developed and maintained with the use of cloud data
services including Amazon AWS, or Microsoft Azure.
Cloud databases are those that have been optimized and developed for a virtual
environment.
The owner of the data does not have to know this or be concerned about what
hardware and software are being used to promote their database.
The capacity of the database to perform can be re-bargained with the cloud
provider regarding the requirements on the database change.
The organizations using this database usually purchase storage and processing
capacity for their data and applications.
When the demands on the database go up, further processing and storage
capabilities are also purchased as required.
Cloud computing has various advantages, including the ability to pay for storage
space and bandwidth on a pay-per-use basis, as well as scalability and high
availability when required.
This database also provides the library the enablement to assist operation
applications over a software-as-a-service platform.
This is a database that accepts or confirm data shared across many different sites.
The data is dispersed around an organisation, rather than being stored in a single
location.
These sites are connected together with the help of communication links, which
enable them to easily access the distributed data.
Various parts of a database, as well as program processes that are replicated and
exchanged at different points in a network are stored in several locations.
o Homogeneous databases
These use the same basic or fundamental hardware and run on the same
operating systems and application procedures.
o Heterogeneous databases
These databases that have different operating systems, basic hardware, and
application procedures at different locations.
In distributes database system, data and the DBMS software are distributed over
several sites but connected to the single computer.
o Modular Development
o Reliability
o Better Response
Efficient data distribution in a distributed database system provides a faster
response when user requests are met locally.
o Costly Software
o Large Overhead
o Data Integrity
a) General-purpose database:
The general-purpose database includes a wide range of data that can be used in a
variety of disciplines.
Other examples are the Proquest, and LexisNexis databases with newspaper,
magazine, and journal articles on a range of topics.
b) Subject-specific database:
a) Operational database:
Functional lines such as services, user relations, circulation, user service, and
others require this type of database.
b) Analytical database:
It is used for data analysis (or) data summarized (or) history of data of particular
business. Example: Datawarehouse.
The analytical database stores historical data and circulation metrics that are
mainly used to make decisions.
Recently, the use of this database has grown in popularity and has developed into
its discipline, namely business intelligence.
The word "business intelligence" refers to a system for collecting and analyzing
business data to create information useful in business decision-making
To build information, it needs extensive data messaging (data manipulation).
These are a data center and an online scientific research front end.
A data warehouse is a type of data storage facilities that stores data in a version
that makes it simpler to make decisions about it.
The data center keeps historical data from operating databases as well as data from
other external sources.
Another important way of classifying databases is through the degree to which the data is
structured.
a) Unstructured data
The unstructured data is usually in a version that does not lead to the processing
that yields information.
b) Structured data
Structured data is the one that arises due to the formatting of unstructured data
promote storage, use, and the creation of information.
The structured data format can be applied consequent on the types of processing
that one intends to follow on the data.
Unstructured data is not always ready for types of processing of the structured
data; structure data is always ready for other types of processing.
The data value 12345678, for example, may be a zip code, a sales value, or a
product code.
Since the value represents a zip code or a product code and is stored as text, it can
no longer be used for mathematical computation.
c) Semistructured
The storage and management of highly structured data is the focus of this database
category.
In other words, enterprises do not have to limit themselves to the use of structured
data.
A new type of database called XML databases is now being used to handle
unstructured and semi-structured data storage and management requirements.
This criterion is based is the data model used to represent the database
a) Hierarchical Database
Just as in any hierarchy, this database follows the progression of data being
categorized in ranks or levels.
As a result, two entities of data will be lower in rank and the commonality would
assume a higher rank.
The child records are linked to the parent record using a field, and so the parent
record is allowed multiple child records.
However, vice versa is not possible.
Due such a structure, hierarchical databases are not easily salable; the addition of
data elements requires a lengthy traversal through the database.
b) Network Database
The child records are given the freedom to associate with multiple parent records.
Notice how the Student, Faculty, and Resources elements each have two-parent
records, which are Departments and Clubs.
The disadvantage lies in the inability to alter the structure due to its complexity and
also in it being highly structurally dependent.
Therefore, the object can be referenced and called without any difficulty.
Furthermore, these objects have attributes which are in fact the data elements that
need to be defined in the database.
An example of such a model is the Berkeley DB software library which uses the
same conceptual background to deliver quick and highly efficient responses to
database queries from the embedded database.
d) Relational databases
Relational database technology provides the most efficient and flexible way to
access structured information.
Considered the most mature of all databases, these databases lead in the production
line along with their management systems.
In this database, every piece of information has a relationship with every other
piece of information.
This is on account of every data value in the database having a unique identity in
the form of a record.
Therefore, every row of data in the database is linked with another row using a
primary key.
Similarly, every table is linked with another table using a foreign key.
Refer to the diagram below and notice how the concept of ‘Keys’ is used to link
two tables.
Due to this introduction of tables to organize data, it has become exceedingly
popular.
In consequence, they are widely integrated into Web-Ap interfaces to serve as ideal
repositories for user data.
What makes it further interesting is the ease in mastering it, since the language
used to interact with the database is simple (SQL in this case) and easy to
comprehend.
It is also worth being aware of the fact that in Relational databases, scaling and
traversing through data is quite a light-weighted task in comparison to Hierarchical
Databases.
e) No-SQL
This data is modeled in means other than the tabular relations used in relational
databases.
The data structures used by NoSQL databases are different from those used by
default in relational databases which makes some operations faster in NoSQL.
The suitability of a given NoSQL database depends on the problem it should solve.
Data structures used by NoSQL databases are sometimes also viewed as more
flexible than relational database tables.
Advantages of NoSQL
There are many advantages of working with NoSQL databases such as MongoDB
and Cassandra.
Disadvantages of NoSQL
DBMS architecture has much to do with how the database is designed and laid out.
Databases are not always directly accessible by users or applications to store or access
data, so we must use various architectures to maintain them based on how users are
connected to the database.
DBMS architectures vary according to how users (clients) connect to the database servers
to carry out their requests.
DBMS architectures are classified based on how many layers are present in their structure,
i.e., tier-based classification.
An n-tier DBMS architecture consists of closely related but independent layers, levels,
and modules that are able to be independently modified, altered, changed, or replaced.
Modifications made to one layer of the architecture do not affect the other layers.
In the DBMS world, the one-tier Architecture is the simplest DBMS architecture,
in which the client, server, and database are all on the same machine.
This architecture puts the user directly in contact with the database itself, so the
user can create, modify, or delete data within the database.
The user sits directly on the database, without any intermediary layer.
Changes can be made easily by the user without the need to use a special tool.
Every change made by the client is immediately reflected in the database, and all
processing is carried out on one server.
In this way, we can perform the operation directly on the database and get a quick
response.
Query processing and transaction management are handled on the server side.
Client-side applications can access the database server directly via API calls, which
enables the application to remain independent of the database with respect to
design, programming, and operation.
In two-tier architecture, the Database system is present at the server machine and
the DBMS application is present at the client machine, these two machines are
connected with each other through a reliable network as shown in the below
diagram.
Advantages:
o Due to the database functionality being handled solely by the server, it has
a high processing capability.
The 3-tiered architecture is one of the most frequently used DBMS architectures.
Between the server (Database layer) and client (Presentation layer), an additional
layer known as the Application layer is added to reduce the server’s query
processing burden.
3-tier architectures separate the tiers based on users’ complexity and how they
interact with the data available in the database.
A 3-tier architecture has the following layers:
This tier consists of the database and the query processing languages.
Here you will find the application server and the programs that will access
the database.
The application layer provides users with an abstract view of the database.
Meanwhile, the database tier is also unaware of any users beyond the
application tier.
The application layer thus sits between the user and the database and serves
as a conduit (mediator).
The end users interact with this layer, and they are not aware of the database
beyond it.
The application can provide multiple abstract views of the database at this
level.
Usually, this type of architecture is used in cases where large web applications
must handle a great deal of traffic.
Advantages:
The client can’t interact directly with the server, thereby preventing
unauthorized access to data.
SUMMARY
In the unit, we discussed in a relatively informal manner the major components of a database
system.
This is a collection of related data with an implicit meaning and hence is a database.
This relationship between symbols and what they represent is the essence of what we
mean by information.
The physical schema describes the database design at the physical level, while the logical
schema describes the database design at the logical level.
A database may also have several schemas at the view level, sometimes called subschemas
that describe different views of the database.
Application programs are said to exhibit physical data independence if they do not depend
on the physical schema, and thus need not be rewritten if the physical schema changes.
Underlying the structure of a database is the data model: a collection of conceptual tools
for describing data, data relationships, data semantics, and consistency constraints.
A database system provides a data definition language to specify the database schema and
a data manipulation language to express database queries and updates. One of the main
reasons for using DBMSs is to have central control of both the data and the programs that
access those data.
A person who has such central control over the system is called a database administrator
(DBA)