9946module 1 (Introduction) - 5th Semester - Computer Science and Engineering
9946module 1 (Introduction) - 5th Semester - Computer Science and Engineering
Data Vs Information
Data Vs Information
The term “Data” means any representation of facts, figures, symbols,
concepts or instructions which is communicated interpreted or processed
by human or electronic machine. Data is represented with the help of
characters like alphabets (A-Z, a-z), digits (0-9) or special characters (+,-
, /,*, <,>, = etc.).
Information is processed or structured data which has some meaningful
values for the system or human. Information is the derived data on which
decisions and actions are based. For the decision to be meaningful, the
processed data must meet the following characteristics:
Timely - Information should be available when required.
Accuracy - Information should be accurate.
Completeness - Information should be complete.
What is Data
• The history of temperature readings all over the world for the past 50
years is data. If this data is organized and analysed to find that global
temperature is rising, then that is information.
Data Information
Data is used as input to process for the Information is the output of processed
computer system. data.
Data does not depend on information. Information depends on data.
Data does not have any meaning and useful Information can be meaningful and useful
until it is organized. when it is organized by data.
Example of information:
Example of data:
The average marks of a class or of the
Each student's examination marks is one
entire college is information that can be
part of data.
derived from the given data.
What is Database?
• A database is a collection of information that is organized so that it can be easily
accessed, managed and updated.
What is Database?
• A database is an organized collection of structured information, or data, typically stored
electronically in a computer system. A database is usually controlled by a database
management system (DBMS). Together, the data and the DBMS, along with the
applications that are associated with them, are referred to as a database system, often
shortened to just database.
• A database is a data structure that stores organized information. Most databases contain
multiple tables, which may each include several different fields. For example, a company
database may include tables for products, employees, and financial records. Each of these
tables would have different fields that are relevant to the information stored in the table.
Before the use computers, the data in the offices or business was
maintained in the files manually. Obviously, it was very laborious, time-
consuming, inefficient task especially in large organizations. Computers
were initially designed for engineering & scientific applications. Since they
helped in efficient management of data in files, file processing
environment simply transformed manual file work to computers. So,
processing becomes fast and efficient.
As file processing systems were used, their problems were also realized
and some of them are very severe.
What is DBMS?
A Database Management System (DBMS) is software designed to store, retrieve, define, and
manage data in a database.
Database Management System (DBMS) is a software for storing and retrieving users' data
while considering appropriate security measures. It consists of a group of programs which
manipulate the database.
Functions of DBMS?
DBMS performs several important functions that guarantee the integrity
and consistency of the data in the database. The most important functions
of Database Management System are
A modern DBMS system provides storage not only for the data, but also
for related data entry forms or screen definitions, report definitions, data
validation rules, procedural code, structures to handle video and picture
formats, and so on.
Functions of DBMS?
(ii) Data Manipulation Management: A DBMS furnishes users with the
ability to retrieve, update and delete existing data in the database.
(iii) Data Definition Services: The DBMS accepts the data definitions such
as external schema, the conceptual schema, the internal schema, and all
the associated mappings in source form.
Functions of DBMS?
(v) Database Communication Interfaces: The end-user's requests for
database access are transmitted to DBMS in the form of communication
messages.
Functions of DBMS?
Encryption is a process to covert data into unreadable format.
Unauthorized person cannot read and understand this encrypted data.
Only authorized use will be able to read it.
Data Views may be defined by DBA to allow user to access the database
tables partially and hide other part of the tables or fields for them.
Functions of DBMS?
(vii) Backup and Recovery Management: The DBMS provides mechanisms
for backing up data periodically and recovering from different types of
failures. This prevents the loss of data
A Data Backup is a copy of data. This copy can include important parts of
the database, such as the control file and data files. A backup is a
safeguard against unexpected data loss and application errors. If you lose
the original data due to any reason, then you can reconstruct it by using a
backup.
The reasons for data loss can be divided into five main groups:
Program errors
Administrator (human) errors
Computer failures (system crash)
Disk failures
Catastrophes (fire, earthquake) or theft
Functions of DBMS?
Recovery Management
If database is damaged due to any reason, the DBMS restore a physical
backup of a data file or control file to the correct state of the database
and this process is called Data Recovery. It is very important to backup
data periodically for proper recovery.
Functions of DBMS?
(viii) Concurrency Control Service: Since DBMSs support sharing of data
among multiple users, they must provide a mechanism for managing
concurrent access to the database. When more than one transactions are
running simultaneously there are chances of a conflict to occur which can
leave database to an inconsistent state. To handle these conflicts we need
concurrency control in DBMS, which allows transactions to run
simultaneously but handles them in such a way so that the integrity of
data remains intact.
Functions of DBMS?
Concurrency Control Protocols
Different concurrency control protocols offer different benefits between
the amount of concurrency they allow and the amount of overhead that
they impose.
Lock-Based Protocols
Two Phase
Timestamp-Based Protocols
Validation-Based Protocols
Functions of DBMS?
(ix) Transaction Management: A transaction is a series of database
operations, carried out by a single user or application program, which
accesses or changes the contents of the database. Therefore, a DBMS must
provide a mechanism to ensure either that all the updates corresponding to
a given transaction are made or that none of them is made.
Functions of DBMS?
In order to fully maintain data integrity and ensure good transactional
behaviour, DBMS supports the ACID properties:
1) Atomicity: If any part of a transaction fails, the database state is left
unchanged.
2) Consistency: Any transaction will leave the database in a consistent
state.
3) Isolation: During a transaction, modified data cannot be accessed by
other operations.
4) Durability: The DBMS can always recover the results of a committed
transaction.
Functions of DBMS?
(xi) Data Integrity
Data integrity refers to the overall completeness, accuracy and
consistency of data and is an important feature of a database system.
Data integrity means that the data contained in the database is
accurate and reliable. Data integrity is a set of rules and standard
procedures and enforced during the database design phase.
Data integrity can be maintained through the use of various error
checking methods and validation procedures.
Disadvantages
1) Costly
2) Degradation of System Performance due to processing overhead
3) Backup and Recovery operations are complex
Structure of DBMS
The database system is divided into
three components:
• Query Processor,
• Storage Manager, and
• Disk Storage
1. Data Definition Language Interpreter
It execute the low-level statements and
records them in a set of tables that
having metadata.
Structure of DBMS
4. Query Evaluation Engine
It executes the low-level instruction generated by DML compiler.
5. Transaction Manager
It ensure that the database remains in consistent state despite the system failure, and
concurrent transaction executions proceed without conflicting.
6. Buffer Manager
It is responsible for data fetching from disk storage into main memory and deciding what data
to cache in memory.
7. File Manager
It manages allocation of space on disk storage and data structure used to represent
information that stored on disk.
Some of the Data Structure are needed as the part of physical system for implementation:
i. Indices: These provide fast access to the data items that hold particular values.
ii. Statistical Data: This store statistical information about data in database. To execute a
query, this information is used by query processor.
iii. Data Files: These are actual files that store the data in database, i.e. these are database
files.
iv. Data Dictionary: This stores metadata about each and every entity of the database along
with the security and entity constraints.
Java Applet or
Client Machine
Java HTML Browser
Applications Client Machine
HTML, RMI, CORBA and Other Calls
JDBC
Application Server
Java
DBMS – Proprietary Protocol Server Machine
JDBC (Business Logic)
File Organizations
The File Organization is the physical organization of the records of a file for
the convenience of storage and retrieval of data records. System designer
basically choose to organize, access and process the records of the various
files in different ways, depending upon the type of the application and
needs of the uses are:
1. Ease of Retrieval
2. Convenience of Updates
3. Economy of Storage
4. Reliability
5. Security
6. Integrity
File Organizations
Serial File
• Records in a file are stored and accessed one after another.
• The records are not stored in any way on the storage medium this
type of organization is mainly used on magnetic tapes.
• The records on a serial file are not in any particular sequence, and so
this type of organisation would not be used for a master file as there
would be no way to find a particular record except by reading
through the whole file, starting at the beginning, until the right
record was located. Serial files are used as temporary files to store
transaction data.
Advantages
• It is simple
• It is cheap
Disadvantages
• It is cumbersome to access because you have to access all
proceeding records before retrieving the one being searched.
• Wastage of space on medium in form of inter-record gap.
• It cannot support modern high speed requirements for quick record
access.
File Organizations
Sequential File
• Storing and sorting in contiguous block within files on tape or disk is
called as sequential access file organization.
• In sequential access file organization, all records are stored in a
sequential order. The records are arranged in the ascending or
descending order of a key field.
• Sequential file search starts from the beginning of the file and the
records can be added at the end of the file.
• In sequential file, it is not possible to add a record in the middle of
the file without rewriting the file.
• Searching of a record requires, on average, access to half the records
in the file.
Advantages
• It is simple to program and easy to design.
• Sequential file is best use if storage space.
Disadvantages
• Sequential file is time consuming process.
• It has high data redundancy.
• Random searching is not possible.
File Organizations
Direct File or Random Access File
• Direct access file is also known as random access or relative file
organization.
• In direct access file, all records are stored in direct access storage device
(DASD), such as hard disk. The records are randomly placed throughout
the file.
• The records does not need to be in sequence because they are updated
directly and rewritten back in the same location.
• This file organization is useful for immediate access to large amount of
information. It is used in accessing large databases.
• It is also called as hashing.
Advantages
• Direct access file helps in online transaction processing system (OLTP)
like online railway reservation system.
• In direct access file, sorting of the records are not required.
• It accesses the desired records immediately.
• It updates several files quickly.
• It has better control over record allocation.
Disadvantages
• Direct access file does not provide back up facility.
• It is expensive.
• It has less storage space as compared to sequential file.
File Organizations
File Organizations
Indexed Sequential File
An indexed sequential file consists of records that can be accessed
sequentially. Direct access is also possible. It consists of two parts −
Data File contains records in sequential scheme.
Index File contains the primary key and its address in the data file.
Following are the key attributes of sequential file organization −
Records can be read in sequential order just like in sequential file
organization.
Records can be accessed randomly if the primary key is known. Index file
is used to get the address of a record and then the record is fetched from
the data file.
Sorted index is maintained in this file system which relates the key value
to the position of the record in the file.
Alternate index can also be created to fetch the records.
File Organizations
Advantages
• Sequential file and random file access is possible.
• It accesses the records very fast if the index table is properly organized.
• It provides quick access for sequential and direct processing.
• It reduces the degree of the sequential search.
Disadvantages
• Indexed sequential access file requires unique keys and periodic
reorganization.
• It requires more storage space.
• It is expensive because it requires special software.
• It is less efficient in the use of storage space as compared to other file
organizations.
Conceptual level:
This is the next higher level than internal level of data abstraction.
It describes What data are stored in the database and What relationships exist
among those data.
It is also known as Logical level.
It hides low level complexities of physical storage.
Database administrator and designers work at this level to determine What data to
keep in database.
Application developers also work on this level.
External Level:
This is the highest level of data abstraction.
It describes only part of the entire database that a end user concern.
It is also known as an view level.
End users need to access only part of the database rather than entire database.
Different user need different views of database. And so, there can be many view
level abstractions of the same database.
Data Independence
A Database stores data about data i.e. Metadata. Metadata itself follows a layered
architecture, so that when we change data at one layer, it does not affect the data at
another level. This data is independent but mapped to each other.
Logical data independence is used to separate the external level from the conceptual
view.
If we do any changes in the conceptual view of the data, then the user view of the
data would not be affected.
For example, a table (relation) stored in the database and all its constraints, applied
on that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from actual
data stored on the disk. If we do some changes on table format, it should not change
the data residing on the disk.
Data Independence
Physical Data Independence
All the schemas are logical, and the actual data is stored in bit format on the disk.
Physical data independence is the power to change the physical data without
impacting the schema or logical data.
If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
Physical data independence is used to separate conceptual levels from the internal
levels.
For example, in case we want to change or upgrade the storage system itself −
suppose we want to replace hard-disks with SSD − it should not have any impact on
the logical data or schemas.
Data Abstraction
Database systems are made-up of complex data structures. To ease the user
interaction with database, the developers hide internal irrelevant details from users.
This process of hiding irrelevant details from user is called data abstraction..
We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is
actually stored in database. You can get the complex data structure details at this
level.
Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user interaction
with database system.
Mapping
Mapping is a process of converting one level to another level. In this process, the
data in one level is related to the data at another level.
Database Languages
A DBMS must provide appropriate languages and interfaces for each category of
users to express database queries and updates. Database languages are used to
read, update and store data in a database. There are large numbers of database
languages like Oracle, MySQL, MS Access, dBase, FoxPro etc. SQL (Structured Query
Language) is commonly used in Oracle and MS Access can be categorized as data
definition language (DDL), data control language (DCL) and data manipulation
language (DML).
Database Languages
Data Definition Language (DDL)
DDL is used for specifying the database schema. It is used for creating tables,
schema, indexes, constraints etc. in database. Lets see the operations that we can
perform on database using DDL:
To create the database instance – CREATE
To alter the structure of database – ALTER
To drop database instances – DROP
To delete tables in a database instance – TRUNCATE
To rename database instances – RENAME
To drop objects from database such as tables – DROP
To Comment – Comment
All of these commands either defines or update the database schema that’s why they
come under Data Definition language.
Database Languages
Data Control language (DCL)
DCL is used for granting and revoking user access on a database –
To grant access to user – GRANT
To revoke access from user – REVOKE
In practical data definition language, data manipulation language and data control
languages are not separate language, rather they are the parts of a single database
language such as SQL.
Sophisticated Users
They interact with the system without writing programs.
They form requests by writing queries in a database query language. These are
submitted to a query processor that breaks a DML statement down into
instructions for the database manager module.
They directly interact with the database by means of query language like SQL.
These users will be scientists, engineers, analysts who thoroughly study SQL
and DBMS to apply the concepts in their requirement. In short, we can say this
category includes designers and developers of DBMS and SQL.
Native Users
These are the users who use the existing application to interact with the database.
For example, online library system, ticket booking systems, ATMs etc which has
existing application and users use them to interact with the database to fulfill their
requests.
System Analyst
He/she analyses the requirements of the end users, especially naïve users and
parametric end users. They are responsible for the design, structure and
properties of the database.
The main concerns of the system analyst is on feasibility, economic aspects and
technical aspects.
Analysts are one among the sophisticated users. They use the tools to perform
their task such as:
1. Online analytical processing (OLAP) - It helps the analysts to view them the
summaries of the data in different ways.
2. Data Mining Tools – It helps the analysts find a certain kind of pattern in the
given data.