IMTC - Lecture - Notes - DBMS
IMTC - Lecture - Notes - DBMS
May 2013
G. K. Sawaisarje
Scientist-C
National Data Center
O/o ADGM(R), Shivajinagar, Pune-411 005
1
Contents
2.13 Examples
2
File based systems (Pre-database systems)
In the old days (period 1970-1990), the data processing activity was mainly associated
with data in files. These data stored in files with specific file formats describing the
record structure composed of related or non-related data fields with varying or fixed data
formats. The data processing is done with C, C++, FORTRAN, COBOL which is
subjective to data file formats. The application designed or written by programmer for
one type of file cannot be applied for other types of files. So it is mainly file dependent
data processing system. For example, in processing surface observations data which are
stored in data file formats namely TAB II (day’s summary) and TAB III (synoptic hour),
the data processing procedures are not the same.
FORMAT FOR DAILY SURFACE TABLE II DATA
3
FORMAT FOR DAILY SURFACE TABLE III DATA
12-13 Date
14-18 Station Level Pressure (in 0.1 hPa)
19-23 Mean Sea Level Pressure (in 0.1 hPa) or Height in geo-potential Meters from
the nearest standard level. Height is reported by the stations of which
station height is above 800 GPM. Other stations will report Mean Sea Level
Pressure
4
61-62 Height of individual Layer of cloud (in code)
i) Code 00 means cloud height is < 30 meters.
ii) Code 01 to 50 then cloud height (in meters) = code x 30
e.g. if code is 43 then cloud height = 43 x 30 i.e. 1290 meters
iii) Codes 51 to 55 are not used
iv) Code 56 to 80 then cloud height (in meters)
= (code-50) x 300
e.g. if code is 77 then cloud height = (77-50) x 300
i.e. 8100 meters
v) Code 81 to 88 then cloud height (in meters)
= 9000+(code-80) x 1500
e.g. if code is 87 then cloud height = 9000+(87-80) x 1500 i.e. 19500
meters
vi) Code 89 means cloud height is > 21000 meters.
vii) Code 90 to 99
90 - <50 m 91 - 50 to 100 m
92 - 100 to 200 m 93 - 200 to 300 m
94 - 300 to 600 m 95 - 600 to 1000 m
96 - 1000 to 1500 m 97 - 1500 to 2000 m
98 - 2000 to 2500 m 99 - 2500 m or more or no
clouds
1. File systems provide a useful historical perspective (or viewpoint) on the way we
handle data.
5
RADIATION DATA
(Hourly Values)
1980 onwards
Commencement time
Commencement time
Cessation Time
Cessation Time
10
11
12
13
14
15
16
17
18
19
20
5
Daily Total
Data Type
Index No.
Century
Month
Type
Type
Year
Date
75 - 76
10 – 11
14 – 16
17 – 19
20 – 22
23 – 25
26 – 28
29 – 31
32 – 34
35 – 37
38 – 40
41 – 43
44 – 46
47 – 49
50 – 52
53 – 55
56 – 58
59 – 61
62 – 65
66 – 68
71 – 72
73 – 74
77 – 78
79 – 80
81 – 82
12 -13
69 -70
6–7
8–9
1- 5
1 watt / m2 = 5.99 * 10 –5 MJ / m2
6
RADIATION DATA FORMAT
(Hourly Values)
1980 Onwards
2
Hourly Radiation Value in Mega Joules / m to the second place of decimal for the
14 – 16 5 38 – 40 13
17 – 19 6 41 – 43 14
20 – 22 7 44 – 46 15
23 – 25 8 47 – 49 16
26 – 28 9 50 – 52 17
29 – 31 10 53 – 55 18
32 – 34 11 56 – 58 19
35 – 37 12 59 – 61 20
7
Col No. Element Explanation
2
62 – 65 Total Total radiation for the day in Whole Mega Joules / m
73 – 74 Cessation time
79 – 80 Cessation time
05 Rain 10 Thunder
2
Conversion Factor : 1Mega Joule = 23.88 cal/cm /min
2 2
1 cal/cm = 0.04187 MJ/m
2 5 2
1 watt / m = 5.99 * 10 – MJ / m
8
3. Use of relative simple characteristics of file systems may help to make the complexity of
database design easier to understand.
4. To convert an obsolete file system to a database system knowledge of the file system’s basic
limitation will be useful.
1. A file folder in a doctor’s office might contain a patient data, one file folder for each
patient. However finding and using data in growing collection of file folders becomes
time consuming and cumbersome task that it becomes less and less likely that such data
would ever generate useful information.
2. For instance of data center, with following questions, which tend to be long and growing.
a. What climate products sold well during the past week, month, quarter or year ?
c. What is the current daily, weekly, monthly, quarterly, or yearly sales rupee volume ?
d. How do the current period’s sales of data requests compare to those of last week, last
month, or last quarter ?
e. Were the various data requests increasing, decreasing or remaining stable during the
past week, month, quarter, or year ?
f. Did sales show trends that could change the inventory requirements ?
Data processing systems provide quick access to data required on questioning by high
authorities. The retrieved data can be utilized to produce reports required for information
exchange between offices/organizations. Data processing specialists in data center create the
necessary computer file structures, often write the software to manage the data within those
structures and design the application programs to produce reports based on the file data. The
description of computer files requires a specialized vocabulary which can be summarized below.
1. DATA “Raw Facts”, such as telephone numbers, maximum temperature and so on. Data
have to be organized in some logical manner to derive some useful meaning. The
smallest piece of data that is “recognized” by the computer is a single character which
requires one byte of computer storage.
2. FIELD A character or group of characters that have specific meaning. A field is used to
define and store data.
9
3. RECORD A logically connected set of one or more fields that describe a person, place or
thing.
File system when grows to a larger size, can be handled by more complex computer. Data
Processing Managers spend more time in managing technical and human resources.
1. Understanding the shortcomings of the file system enables us to understand the reasons
for development of database. For example, if a file is a sequence of ASCII codes, then
you can dump it to display the record structure. A record may appear as:
The word JONES, leads you to suspect that is a personnel record but rest of the record
doesn’t supply much more information. Therefore it appears that data are not self
describing and proper documentation is required to describe the fields in the file. So data
are meaningful only in the context of program written by the programmer.
2. Each data retrieval task requires extensive programming in third generation language
namely FORTRAN (FORmula TRANslation), COBOL (Common Business Oriented
Language), BASIC (Beginners All purpose Symbolic Instruction Code). The
programmers must be familiar with physical file structure, i.e. how and where files are
stored in computer. It becomes difficult and time consuming to reference every file in a
program to establish the precise location of the various files and system components and
data characteristics. So complex coding to access paths is required.
As the file becomes more complex, the access paths become difficult to manage and any
ad-hoc queries fail to produce necessary reports using third generation language. Each
file must have its own file management system so that access to file can be made. The
access steps might be any one of the following:
Then, the original file is deleted and finally all programs using data files must be
modified to fit revised structure. Thus any file structure change however minor, forces
modification in all the programs that uses data in the file.
Structural dependence requires modification of related programs using data file and data
file structure. Data dependence means all data subject to change when any of the file’s
data characteristics change. For example, changing field from integer to decimal requires
changes in all programs that access the file. The system access program must tell the
computer “what” and “how” to do it, i.e. noting the differences as given below.
How the human beings views the data (Data logical format)
Each program must contain specification of opening of a specific file type, record
specification, and its field definitions. Based on the programming and management point
of view, it makes file system extremely troublesome.
11
Daily surface TAB III data
For example, In case of daily surface TAB II data, there could be several records for specific
index (cols. 1-5). If addition of field (cols. 6-11) i.e. date to specific index is done, it would be
appropriate to construct unique record identifier (cols. 1-11). From user’s point of view, it is a
much better or flexible record definitions which fulfil reporting requirements by breaking fields
in to their component parts.
Data Redundancy
It means the same data stored in many different locations. This leads to data inconsistency and
data anomalies. Data inconsistency is different and conflicting versions of same data or data that
lack data integrity. Data entry errors occur when complex entries are made in several different
files and / or recur frequently in one or more files. Data anomalies define anomaly as an
abnormality. Due to data redundancy abnormal conditions sets in by forcing field value changes
in different locations. The data integrity needs to be maintained for any correct field value
changes in many places. So data anomalies exist.
If we modify any value of parameter it may give rise to cascading effect to its related fields. If
new fields are added to data records it may conflict with existing programs or adding new values
to empty field as corrections may lead to inconsistencies. If we delete any value in records it may
give rise to anomalies to existing data.
12
Database and Database Management system
5) Data integrity can be maintained. Data integrity refers to the problem of ensuring that
database contains only accurate data.
7) Data independence can be achieved, i.e. data and programs that manipulate the data are
two different entities.
13
Management of data also involves both defining structures for storage of information and
providing mechanisms for the manipulation of information. In addition, the database system
must ensure the safety of the information stored, despite system crashes or attempts at
unauthorized access. If data is shared among several users, the system must avoid possible
anomalous results. For example, consider station Pune, with daily temperature data for the month
of May. If two observers retrieve temperature (say maximum and minimum respectively) for the
station Pune from daily temperature data table at about the same time, the result of the
concurrent executions may leave the temperature data table in an incorrect or (inconsistent) state.
In particular, the table may retrieve either maximum temperature or minimum temperature,
rather than maximum and minimum temperature. In order to guard against this possibility, some
form of supervision must be maintained in the system.
Data Processing is the term generally used to describe what can be done by large mainframe
computers from the late 1940’s until the early 1980’s (and which continues to be done in most
large organizations to a greater or lesser extent even today). Large volumes of raw transaction
data fed into programs that update a master file, with fixed format reports written to paper. In
this context raw is used to indicate the facts which have not yet been processed to reveal their
meaning. Example are minimum temperature, dry bulb temperature etc. and transaction data are
data describing an event (the change as a result of a transaction) and is usually described with
verbs. Transaction data always has a time dimension, a numerical value and refers to one or more
objects (i.e. the reference data) ,i.e day with weather phenomenon thunderstorm with its time of
occurrence and duration.
The term data management refers to an expansion of this concept, where the raw data, previously
copied manually from paper to punch cards, and later into data entry terminals. The master file
concept has been largely displayed by database management systems, and static reporting
replaced or augmented by ad-hoc reporting and direct inquiry including downloading of data by
users or customers. Nowadays the presence of internet and personal computers simultaneously at
large number of places has been a driving force in the transformation of data processing to the
more global concept of data management systems.
Characteristics of database
1. Concurrent use
A database system allows several users to access the database concurrently. Answering
different questions from different users with the same (base) data is a central aspect of an
information system. Such concurrent use of data increases the economy of a system.
14
2. Structured and described data
A fundamental feature of database approach is that the database systems does not
necessarily contain the data but also the complete definition and description of these data.
These descriptions are basically static information , from the geographical characteristics
of the station, to the sensor information , the observing programme at station level and so
on. It is commonly known as “the data about the data” (Metadata) i.e all the key
information concerning the origin of the data in the widest sense.
The structure of database is described through metadata which is also stored in the
database. Application software does not need any knowledge about the physical data
storage like encoding, format, storage place etc. It only communicates with the help of a
management system of a DBMS via a standardized interface with the help of a
standardized language like SQL. The access to the data and the metadata is entirely done
by the DBMS. In this way all the applications can be totally separated from the data.
Therefore database internal reorganizations or improvement of efficiency do not
influence on the application software.
4. Data integrity
It is a byword for the quality and reliability of the data of the DBMS. In a broader sense
data integrity includes also the protection of the database from the unauthorized access
and unauthorized changes. Data reflects facts of real world. For example the data tables
storing climate data with an observation structure. An observation being considered as a
group of meteorological elements associated to a particular station at a given time.
Therefore, there is one table for one data type.
5. Data persistence
It means in a DBMS all data is maintained as long as it is not deleted explicitly. The life
span of data needs to be determined directly or indirectly to the user and must not be
dependent on the system features. Additionally data once stored in a database must not be
lost.
15
Functions of DBMS
The DBMS provides functions to define the structure of the data in the application. These
include defining and modifying the record structure, the type and size of fields and the various
constraints to be satisfied by the data in the field.
Once the data structure is defined, data needs to be inserted, modified and deleted. These
functions which perform these operations are part of DBMS. These functions can handle plashud
and unplashud data manipulation needs. The queries that form part of the application are called
plashud queries. Any ad-hoc queries which are performed on a need basis are called unplashud
queries.
The DBMS contains modules which handle security and integrity of the data in the application.
Recovery of the data after system failure and concurrent access of records by multiple users is
also handled by DBMS.
The maintenance of data dictionary which contains the data definition of the application is also
one of the functions of DBMS.
Optimizing the performance of the queries is one of the important functions of DBMS.
DBMS Architecture
To help achieve and visualize these characteristics, a three schema architecture was proposed.
The goal is to separate the user applications and the physical database. In this architecture, the
schemas can be defined at the following three levels.
I) The internal level has the internal schema which describes the physical storage structure
of the database. The internal schema uses the physical data model and describes the complete
details of data storage and access paths of the database.
16
II) The conceptual level has the conceptual schema which describes the structure of the
whole database for a community of users. The conceptual schema hides the details of
physical storage structures and concentrates on describing entities (a thing with distinct or
independent existence), data types, relationships, user operations, and constraints. A high
level data model or an implementation data model can be used at this level.
III) The external or view level includes a number of external schemes or user views.
The ANSI (American National Standards Institute)/SPARC three level architecture. This shows that a data
model can be an external model (or view), a conceptual model, or a physical model. This is not the only
way to look at data models, but it is a useful way, particularly when comparing models. (Matthew West
and Julian Fowler (1999). Developing )High Quality Data Models. The European Process Industries STEP
Technical Liaison Executive (EPISTLE).
Data Dictionary
The data dictionary is a term that refers to as a DBMS component that stores the definition of
data characteristics and relationships. In other words, it is a set of tables database uses to
maintain information about its own databases. It contains information about tables, indexes,
clusters and so on. There exist two types of data dictionary: integrated and stand alone.
Integrated data dictionary is included with the DBMS. For example all relational DBMS include
a built in data dictionary that is frequently accessed and updated by the RDBMS. Database
Administrator may use stand alone data dictionary systems especially in case of older DBMS.
The data dictionary may be classified as active or passive. In case of active data dictionary it is
automatically updated by the DBMS with every database access, thereby keeping its access
information up to date. However a passive data dictionary requires a batch process to be run.
17
Data dictionary access information is normally used by the DBMS for query optimization
purpose.
The main function of data dictionary is to store description of all objects that interact with the
database. The data dictionary typically stores descriptions of all.
1. Data elements that are defined in all tables of all databases. Specifically, data dictionary
stores the name, display formats, internal storage formats, and validation rules. It tells
where an element is used, by whom it is used and so on.
2. Tables defined in all databases. The data dictionary may like store the name of table
creator, the data of creation access authorizations, the number of columns and so on.
3. Indexes defined for each database tables. For each index the DBMS stores at least the
index name and the attributes used, the location, specific index characteristics and the
creation date.
4. Define databases. Who created each database, the date of creation where the database is
located, who the DBA is and so on.
5. Programs that access the database including screen formats, report formats, application
formats, SQL queries and so on.
7. Relationships among data elements, which elements are involved. Whether the
relationship are mandatory or optional, the connectivity and cardinality (i.e. number of
elements in a set) and so on.
The most commonly used DBMS is the relational database management system (RDBMS)
which is based on relational model as introduced by Dr. Edgar F. Codd. In RDBMS relationship
between two tables or files can be specified at the time of table creation. In this, there are
multiple levels of security.
3. Object level
Many tables are grouped in one database in RDBMS. Structured Query Language (SQL) is a
language that provides an interface to relational database systems. In common usage SQL also
encompasses DML (Data Manipulation Language), for INSERTs, UPDATEs, DELETEs and
18
DDL (Data Definition Language), used for creating and modifying tables and other database
structures.
Examples of DDL
4. TRUNCATE Remove all records from a table, including all spaces allocated for the
records are removed
Examples of DML
3. DELETE Deletes all records from a table, the space for the records remain
2. SAVEPOINT Identify a point in a transcation to which you can later roll back
4. SET TRANSCATION Change transaction options like what rollback segment to use
5. GRANT REVOKE Grant or take back permissions to or from the oracle users.
19
Examples of data Query language statements
20
Example of SQL Query
SELECT TB2_INDEX,TB2_DATE,TB2_RF
FROM TB2
WHERE TB2_MAXT > 30.1
AND TB2_AVG_WSP >10;
SELECT TB2_INDEX,TB2_DATE,TB2_RF
FROM TB2
WHERE TB2_MAXT > 30.1
AND TB2_AVG_WSP >10
ORDER BY TB2_INDEX,TB2_DATE;
THIS IS SIMPLEST COMMAND RETRIEVE EVERY THING FROM TABLE BUT USE OF THIS
COMMAND IS DEPRICATED AT LARGE DATABSES AS THIS WILL SLOW DOWN THE DATABASE.
21