Data Versus Information
Data Versus Information
Data - Data is a row and unorganized fact that required to be processed to make it meaningful. Data
can be simple at the same time unorganized unless it is organized. Generally, data comprises facts,
observations, perceptions numbers, characters, symbols, image, etc.
Data is always interpreted, by a human machine to derive meaning. So, data is meaningless. Data
contains numbers, statements and characters in a raw form.
Information - Information is a set of data which is processed in a meaningful way according to the
given requirement. Information is processed, structured or presented in a given content to make it
meaningful and useful. It is processed data which includes data that possess context, relevance and
purpose. It involves manipulation of raw data.
Information assigns meaning and improves the reliability of data. It helps to ensure undesirability and
reduces uncertainty. So, when the data is transformed into information, it never has any useless
details.
Data Vs Information
Meaning Data does not have any specific It carries meaning that
Purpose. has been assigned by
Interpreting data.
Support for Decision It can’t be used for decision It is widely used for
Making making decision making.
Record, File
A record is collection of data items and is the unit for data storage at the logical or file level. A record
may consist of different fields and each field corresponds to an attribute ( column ) of the record. For
example, a student records has fields such as student name, courses, class, roll no., and grades.
Spanned and Unspanned Records - The records may be of fixed size or may be variable in length.
One block ( also sometimes called a physical record ) may contain multiple records.
Unspanned Records - When many records restricted to fit within one block due to their small size
then such record are called Unspanned Records.
File - Relative data and information is stored collectively in file formats. A file is a sequence of records
stored in binary formats. Disk drive is formatted into several blocks that can store records. File records
are mapped into those disk blocks.
File Organization - File organization defines how file records are mapped onto disk blocks. We have
four type of file or organization to organize file records:
1. Heap File Organization - When a file is treated using Heap File Organization, the operating system
allocates memory area to that file without any further accounting details. File records can be placed
placed anywhere in that memory area. It is the responsibility of the software to manage the records.
Heap File does not support any ordering, sequencing or indexing on its own.
2. Sequential File Organization - Every file record contains a data field ( attribute ) to uniquely
identify that record. In Sequential File Organization, records are placed in the file in some sequential
order based on the unique key field or search key. Practically, it is not possible to store all the records
sequentially in physically form.
3. Hash File Organization - Hash File Organization uses Hash function computation on same fields of
the records. The output of the Hash function determines the location of disk block where the records
are to be placed.
4. Clustered File Organization - Clustered File Organization is not considered good for large
database. In this mechanism, related records from one or more repations are kept in the same disk
block, that is, the ordering of records is not based on Primary key or Search key.
5.
File Operations
B) Locate - Every file has a file pointer, which tells the current location where the data is to be read or
written.
C) Read - By default, when the files are opened in read mode, the file pointer points in the beginning
of the file. There are options where the user can tell the operating system where to locate the file
pointer at the time of opening a file. The very next data to the file pointer is read.
D) Write - User can select to open a file in write mode, which enables them to edit its contents. It can
be Insertion, Deletion or Modification.
E) Close - This is the most Important operation from the operating system’s point of view. When a
request to close a file is generated, the operating system:
Removes all the blocks ( if in shared mode ).
Saves the data ( if altered ) to the secondary media.
Released all the buffers and buffers and file handlers associated with the file.
The organization of data inside a file plays a major role here. The process to locate the file pointer to a
desired record inside a file various based on whether the records are arranged sequentially or
clustered.
Data Dictionary
A data dictionary contains metada i.e. data about the database. The data dictionary is very important
as it contains information such as what is the database, who is allowed to access it, where is the
database physically stored. The users of the database normally don’t interact with the data dictionary,
it is only handled by the database administrations.
Field Name Data Type Field Size For Display Description Example
Employee Integer 10 Unique ID of 22BBA001
Number Each employee
Active Data Dictionary - If the structure of the database or its specifications change at any point of
time, it should be reflected in the data dictionary. This is the responsibility of the database
management system in which the data dictionary resides. So, the data dictionary is automatically
updated by the database management system when any changes are made in the database. This is
known as an active data dictionary as it is self updating.
Possessive Data Dictionary - This is not as useful or easy to handle as an active data dictionary. A
passive data dictionary is maintained seprately to the database whose contents are stored in the
dictionary. That means that if the database is modified the database dictionary is not automatically
updated as in case of active data dictionary. So, the passive data dictionary has to be manually
updated to match the database. This needs careful handling or else the database and data dictionary
are out of sync.
Database Administrator: Functions and Responsibilities
A database administrator ( DBA ) is a person or a group of person who are responsible of managing all
the activities related to database system. This job requires a high level of expertise by a person or
group of person. A DBA is the controller of everything related to the database system.
Main role and duties of DBA.
1. Installing and Configuration of Database - DBA is responsible for installing the Database Software.
He configure the software of database and then upgrades it if needed.
There are many database software like Oracle, Microsoft SQL, My SQL, MS Access in the industry, so,
DBA decides how the installing and configuring of these database software will take place.
2. Deciding the Hardware Device - Depending upon cost, performance and efficiency of the hardware,
it is DBA who have the duty of deciding which hardware device will suit the company requirement.
3. Managing Data Integrity - Data integrity should be managed accurately because it protects the data
from unauthorized use.
4. Decides Data Recovery and Backup Method - If any company is having a big database, then it is
likely to happen that database may fail at any instance. It is require that a DBA takes backup of entire
database in regular time span. Also the recovery of database is done by DBA if they have lost the
database.
5. Turning Database Performance - Database performance plays an important role for any business. If
user is not able to fetch data speedily then it may loss company business. So by tuning and modifying
SQL commands a DBA can improves the performance of database.
6. Capacity Issues - All the database have their limits of storing data in it and the physical memory also
has some limitations. DBA has to decide the limit and the capacity of the database and all the issues
related to it.
7. Database Design - The logical design of the database is designed by the DBA. Also a DBA is
responsible for physical design, external model design and integrity control.
8. Database Accessibility - DBA writes sub schema to decide the accessibility of database. He decides
the users of the database and also which data is to be used by which user. No user has to power to
access the entire database without the permission of DBA.
9. Decides Validation Checks on Data - DBA has to decide which data should be used and what kind of
data is accurate for the company. So he always puts validation checks on data to make it more
accurate and consistence.
10. Monitoring Performance - DBA has to monitor the performance of the database. A DBA monitors
the CPU and memory usage.
11. Decides Content of the Database - A database system has many kind of content information in it.
DBA decides fields, types of fields and range of values of the content in the database system. One can
say that DBA decides the structure of database files.
12. Provides Help and Support to User - If any user needs help at any time then it is the duty of DBA to
help that user.
13. Database Implementation - Database has to be implemented before anyone can start using it. So
DBA implements the database system. DBA has to supervise the database loading at the time of its
implementation.
14. Improve Query Processing Performance - Queries made by the users should be performed
speedily. As we have discussed that users need fast retrieval of answers so DBA improves query
processing by improving their performance.
File system is the most ancient and still the most popular way to keep your data files organized.
When it comes to security and appropriate management of data based on constraints and other staff
that we are going to talk about, the first choice of many experts is DBMS.
Earlier people used to keep records and maintain data in registers and any alteration/retrieval to this
data was difficult.
When computers came same agenda was followed for storing the data on drives. It is an easy way to
store data in general files like images, text, videos, audios, etc. But security is less because only
options available to these files are the options given by the operating system such as locks, hidden
files and sharing. These files are hard to maintain when it comes to frequent changes to these files.
Data redundancy is more and can’t be controlled easily. Data integration is hard to achieve and also
data consistancy is not met.
Database Management System - DBMS is an effective way to store the data when constraints are
high and data maintenance and security are the primary concern of the user.
DBMS stores data in the form of interrelated tables and files. User can easily access database without
worrying about the underlying schema of the database. Data redundancy is minimized due to
interaction of data entities and also provide a procedure of data integration due to centralization of
data in the database. Security of data is also maximized using password protection,
encryption/decryption, granting authorized access and others.
It is used to general files which requires less security. It is used when security constraints are
High.
It stores unstructured data as isolated data files/entities. It stores structured data which have
Well defined constraints and
Interrelation.
User locates the physical address of file to access data. User is unaware of physical address
Where data is stored.
DBMS
Architecture
2. Tier 1. Tier
Architectur 3. Tier Architectu
e Architect re
ure
1. Tier Architecture - In this architecture, the database is directly available to the user. It means the
user can directly use the database. Example - SQL server, MS Access. The changes can be done
directly on it. It doesn’t provide handy tools for end users.
It is used for development of the local application, where programmers can directly communicate
with the database for the quick response.
2. Tier Architecture - It is also known as client server architecture. In this architecture the user
connects with the database via on application. For this interact in ADI’s like ODBC, JDBC are used.
The user interfaces and application programs are run on the client - side. The server size is responsible
to provide the functionalities like query processing and transaction management.
Client Application
User
1. Tier Architecture - In this, the client can’t communicate with the server. The application on the
client - end interacts with an application server which further communicates with the DBMS. It is used
in case of large web application.
Application Server
Application Client
Client
User
1. Physical Database schema - It can be defined as the design of a database at its physical level. In this
level, it is expressed how data is stored in blocks of storage.
2. Logical Database Schema - It can be defined as the design of the database as its logical level. In this
level, the programmer as well as the Database Administrator (DBA) both can work. The internal
details will be remain hidden at this level.
3. View Schema - It can be defined as the design of database at view level which generally describes
enduser interaction with database systems.
For Example - Let suppose, you are storing students information on a student’s table.
- At the physical level these records are described as chunks of storage ( in form of bytes, GB, TB etc.)
in memory and these remains hidden.
- At the logical level these records can be illustrated as fields and attributes along with their data
types, their relationship with each other can be logically implemented.
- At the view level, a user can able to interact with the system, with the help of GUI and enter the
details on the screen.
View Schema -
Logical Schema
Stu- ID Stu- Name Proj- ID
Physical Schema
Sub Schema - It is a subset of the schema and inherits the same property that a schema has.
The plan for a view is often called sub schema.
Sub schema refers to an application programmers view of the data item types and record types. In
this the user can view only that part of database which they want or interested. Therefore, different
application programs can have different view of data.
Database Architecture
An early proposal for a standard terminology and general architecture for database systems was
produced in 1971 by the DBTG ( Data Base Task Group ) appointed by the conference on Data Systems
and languages ( CODAS 1/2 1971 ). The DBTG recognized the need for a two level approach with a
system view called sub schema. The American National Standards Institute ( ANSI ) Standards
Planning and Requirements Committee ( SPARC ) produced a similar terminology mid architecture in
1975 ( ANSI 1975 ). ANSI - SPARC recognized the need for a three level approach with a system
catalog.
Each user should be able to access the same data, but have a different customized view of the
data. Each user should be able to charge the way he or she view the data, and his change should
not affect other users.
User should not have to deal directly to physical database storage details, such as indexing or
hashing. In other words a user’s interaction with the database should be independent of storage
considerations.
The Database Administrator ( DBA ) should be able to change the database storage structures
without affecting the user’s view.
The internal structure of the database should be unaffected by changes to the physical aspects
of storage, such as the change over to a new storage device.
The DBA should be able to change the conceptual structure of this database without affecting all
users.
External Level or View Level - It is the user’s view of the database. This level describes that part of the
database that is relevant to each user. External level is the one which is closest to the end users. This
level deals with the way in which individual users view data. Individual users are given different views
according to the user’s requirement.
A view involves only these portions of a database which are the concern to a user. Therefore same
database can have different views for different users. The external view insulates users from the
details of the internal and conceptual levels. External level is also known as the view level. In addition
different views may have different representations of the same data. For example, one user may view
dates in the form of the details of the internal and conceptual levels. External level is also known as
the views levels. In addition, different views may have different representations of the same data. For
example, one user may view dates in the form of ( day, month, year ), while another may view dates
as ( year, month, day ).
Conceptual Level or Logical Level - It is the community view of the database. This level describes what
data is stored in the database and the relationships among the data. The middle level in the three
level architecture is the conceptual level. This level contains the logical structure of the entire
database as seen by the DBA. It is a complete view of the data requirements of the organization that
is independent of any storage considerations. The conceptual level represents -
All entities, their attributes and their relationships. An entity is an object whose information is
stored in the database. For example, in student database the the entity is student. An attribute is
a characteristics of interest about an entity.
For example, in case of student database Roll No, Name, Class, Address, etc. Are attributes of
entity students.
The conceptual level supports each external views in that any data available to a user must be
contained in, or derivable form, the conceptual level. However, this level must not contain any
storage dependent details. For instance, the description of an entity should contain only data types of
attributes ( for example, integer, real, character ) and their length ( such as the maximum number of
digits or characters ), but not any storage considerations, such as the number of bytes occupied
conceptual level is also known as the logical level.