0% found this document useful (0 votes)
17 views48 pages

Database Concepts Notes

This document provides an overview of database concepts, including definitions of databases, data, and information, as well as applications and advantages of databases in various fields such as banking and telecommunications. It discusses the evolution of data processing, data processing cycles, and key database terms like entities, attributes, and relationships. Additionally, it covers database management systems (DBMS), their architecture, types of data models, and the significance of data abstraction and independence.

Uploaded by

Shamith Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views48 pages

Database Concepts Notes

This document provides an overview of database concepts, including definitions of databases, data, and information, as well as applications and advantages of databases in various fields such as banking and telecommunications. It discusses the evolution of data processing, data processing cycles, and key database terms like entities, attributes, and relationships. Additionally, it covers database management systems (DBMS), their architecture, types of data models, and the significance of data abstraction and independence.

Uploaded by

Shamith Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

Chapter-13

DATABASE
CONCEPTS
DATABASE
A Database is a collection of logically related data
organized in a way that data can be easily accessed,
managed and updated.
DATA

Data is a collection of facts, numbers, letters or symbols that the computer


process into meaningful information.

INFORMATION
•Information is processed data, stored, or transmitted by a
computer.
APPLICATIONS OF DATABASE.

Banking: For customer information, accounts and loans,


and banking transactions.
Colleges: For student information, course registrations
and
grades.

Credit card transactions: For purchases on credit cards


and generation of monthly statements.

Finance: For storing information about holdings, sales


and
purchases of financial instruments such as stocks and
bonds.

Sales: For customer, product, and purchase information.


Telecommunication: For keeping records of call made,
generating monthly bills, maintaining balance on prepaid
calling cards, and storing information about the
communication networks.
ADVANTAGES OF DATABASE.

• Redundancy can be minimized or controlled: In DBMS environment if


redundancy is present, then it can be controlled by propagating updates
in all the places where ever redundant data is present.

• Data Integrity: Data Integrity refers to the correctness of the data in


the database. In other words, the data available in the
database is reliable data.

• Data Sharing: In DBMS, data is stored in the centralized database and


all the permitted users can access the same piece of
information required at the same time.

• Database Security: DBMS provides a variety of security mechanisms


for the user to protect his or her data stored in the
database.

• Supports Concurrent access: DBMS supports concurrent access to the


same data stored in the database by applying locking and time stamp
mechanisms.
EVOLUTION OF DATABASE
MANUAL DATA PROCEESING AND ELECTRONIC DATA
PROCESSING

Manual Data Processing Computerized Data Processing

1 The volume of data, which can be The volume of data, which can be
processed, is limited. processed is large
2 Requires large quantity of paper Requires less quantity of paper
3 Speed and accuracy is executed is limited Faster and Accurate
4 Labour cost is high Labour cost is low
5 Storage medium is paper. Storage medium is Hard disk etc.
DATA PROCESSING
CYCLE.
DATA PROCESSING CYCLE.
Data Collection: It is the process of systematic gathering of data from various
sources that has been systematically observed, recorded and organized.

Data Input: The raw data is put into the computer using a keyboard, mouse or
other
devices such as the scanner, microphone and the digital camera.

Data Processing: Processing is the series of actions or operations on the input


data to generate outputs.
Data storage: Data and information should be stored in memory so that it can
be accessed later.

Output: The result obtained after processing the data must be presented
to the user in user understandable form. The output can be generated in the
form of report as hard copy or soft copy.

Communication: Computers now-a- days have communication ability which


increases their power. With
wired or wireless communication connections, data may be input from a far
place, processed in a remote area and stored in several different places and
then transmitted by modem as an e- mail or posted to the website where the
File : File is a basic unit of storage in computer system.

Database : A Database is a collection of logically


related data organized a way that data can be
easily accessed, managed or updated.

FIELD

Each column is identified by a distinct header


called attribute or filed.

RECORD/TUPLE
A single entry in a table is called a record or row.
A record in a table represents set of related
data.
Records are also called the tuple.
DATABASE
TERMS

ENTITY
An Entity can be any object, place, person or class.
In E-R Diagram, an entity is represented using rectangles.

INSTANCE
The collection of information stored in the
database at a particular moment is called an
instance of the database.

ATTRIBUTE/FIELD

It is defined as a named column of a relation. Ex:


In STUDENT table, Regno, Name, Age, Class,
Combination and Marks.
DATABASE
TERMS
RELATION
A relation is defined as a table with columns and
rows. Data can be stored in the form of a two-
dimensional table.

DOMAIN
It is defined as a set of allowed values for one or
more attributes.

TABLE
A table is a collection of data elements organized
in terms of rows and columns. Table is the
simplest form of data storage.

KEY
It is a column or columns which identifies the each
row or tuple.
DATA TYPES OF DBMS

• Integer

• Logical data type/Boolean

• Characters

• Strings

• Date fields

• Text fields

• Memo data type


DATABASE USERS.
To design, use and maintain the database, many peoples are involved. The people
who work with the database include:
End Users, System Analysts, Application programmers, Database
Administrators (DBA)

End Users (Database Users)


Database users are those who interact with the database in order to query and
update the database, and generate reports.

System Analysts
System analysts determine the requirement of end users; (especially naïve users),
to create a solution for their business need and focus on non-technical and technical
aspects.

Application programmers
These are the computer professionals who implement the specifications given by the
system analysts and develop the application programs.

Database Administrators (DBA)


DBA is a person who has central control over both data and application.
Some of the responsibilities of DBA are authorization access, schema definition and
modification, new software installation and security enforcement and administration
DBMS – DATA BASE MANAGEMENT
SYSTEM

• A DBMS is a software that allows creation


definition
and manipulation of Database.

• DBMS is a tool used to perform any kind of


operation on data in Database.

• DBMS also provide protection and security


to database.
FEATURES OF DATABASE
SYSTEM.
• Controlled data Redundancy can be minimized or controlled: In DBMS environment if
redundancy is present, then it can be controlled by propagating updates in all the places where
ever redundant data is present.

• Enforcing Data Integrity: Data Integrity refers to the correctness of the data in the database.
In other words, the data available in the database is reliable data.

• Data Sharing: In DBMS, data is stored in the centralized database and all the permitted users
can access the same piece of information required at the same time.

• Database Security: DBMS provides a variety of security mechanisms for the user to protect
his or her
data stored in the database.

• Supports Concurrent access: DBMS supports concurrent access to the same data stored in
the database by applying locking and time stamp mechanisms.

• Multiple user interfaces: In order to meet the needs of various users having different
technicial knowledge.DBMS provides different types of interfaces such as query languages,
application program interfaces, and graphical user interfaces.

• Backup and Recovery : This RDBMS provides backup and recovery subsystems that is
DATA
ABSTRACTION
A major purpose of. a database system is to provide users with an abstract view of
the data. That is the system hides certain details of how the data are stored
and maintained.

There are three level of data


abstraction.

Physical Level( Internal


level)

Conceptual Level (Logical


level)

View Level(External level)


Physical Level:
It is the lowest level of abstraction describes how the data are actually
stored.
The physical level describes complex low- level data structures in detail.

It contains the definition of stored record and


method of representing the data fields and access aid used.
Conceptual Level:

It is the next higher level of abstraction that


describes what data are stored in the database and what relationships exist among
those data.

It also contains the method of deriving the objects in the conceptual view from the
objects in the internal view.

External /View Level:


It is the highest level of abstraction that describes only part of the entire database.
It also contains the method of deriving the objects in the external view from the
objects
in the conceptual view.
Data Independence

The capacity to change data at one layer does not affect the data at another layer is called data independence.

Two types of data independence are:


o Physical Data Independence
o Logical Data Independence
physical data independence.

It is the capacity to change the internal level without having to change either the schemas at the conceptual or
external level.
Changes to the internal schema may be needed because some physical files had to be reorganized.
Physical data independence refers to the data insulation of an application from the physical storage structure
only, it is easier to achieve than logical data independence.
The physical data independence are:
o File Organization
o Database Architecture
o Database Models
DIFFERENCE BETWEEN SERIAL AND DIRECT ACCESS
FILE ORGANIZATION.

 Serial File Organization:


 Organization is continuous and simple.
 Data processing, which requires the use of all records, is best suited to use
this method.
 Direct Access File Organization
 The type of storage device used is comparatively expensive.
 It is less efficient in the usage of storage space compared to the sequential
organization.
ISAM with
example.
 The index sequential file organization is a combination of Sequential file
organization and an Index file.
 Also referred as ISAM (indexed sequential access method).
 Data is stored physically in adjacent storage locations and there exists a logical
relationship among the data stored by using ordering field.
 An additional file called as Index file would be created, which contains n number
of
records.
 Each record of index file has two fields:
o The field is of the same data type as the ordering key field and
o The second field is a pointer to a disk block (a block address).
ADVANTAGES AND DISADVANTAGES OF ISAM.

 Advantages
o Search time is less.
o There are fewer index entries than there are records in the data file.
o Quick access to the records even when the volume of records is high.

 Disadvantages
o Additional file (index file) has to be created.
o Wastage of storage space by creating and maintaining the index file.
o Always indirect retrieval of data because first search begins in the index
files
then moves to the data file (No direct retrieval).
DBMS
ARCHITECTURE.

The design of Database Management System highly depends on its


architecture. It can be centralized or decentralized or hierarchical.

Database architecture is logically divided into three types.

Logical one-tier in 1-tier Architecture Logical two-tier Client/Server


Architecture. Logical three-tier Client/Server Architecture.
LOGICAL ONE-TIER IN 1-TIER
ARCHITECTURE:

DBMS is the only entity where user directly sits on DBMS and uses
it.

Any changes done here will directly be on DBMS itself.

It does not provide handy tools for end users and preferably
database designers and programmers use single tier architecture.
TWO-TIER CLIENT / SERVER ARCHITECTURE:

Two-tier Client / Server architecture is used for User Interface


program and Application
Programs that runs on client side.

An interface called ODBC (Open Database Connectivity)


provides an API that
allows client side program to call the DBMS.

Most DBMS vendors provide ODBC drivers. A client program


may connect to several DBMS’s.
In this architecture some variation of client is also possible for
example in some DBMS's more functionality is transferred to
the client including data dictionary, optimization etc.
THREE-TIER CLIENT / SERVER
ARCHITECTURE:

Three-tier Client / Server database architecture is


commonly used architecture for web applications.
Intermediate layer called Application server
or Web Server stores the web connectivity
software and the business logic (constraints) part
of application used to access the right
amount of data from the database server.

This layer acts like medium for sending partially


processed data between the database server and
the client.
Database Model.

Data model is a collection of conceptual tools for


describing data, data relationship, data
semantics and constraints.

A data model generally consists of Data model


theory, which is a formal description of how data
may be structured and used.
Data model instance, which is a practical data
model designed for a particular application.

The process of applying model theory to create a


data model instance is known as data modelling.

In history of database design, three models


inhave
use. been
Hierarchical
Model Network
Model
Relational
Model
Hierarchical data model.
The Hierarchical data model organizes data in a tree structure.
In this data model, data is represented by a
collection of records and the relationships are represented by links.
In this model each entity has only one parent but can have several children. At
the top of hierarchy there is only one entity which is called Root node.

Advantages:
Simplicity: The relationship between the various layers is logically simple. Data
Security: The data security is provided by the DBMS.
Data Integrity: There is always link between the parent segment and the child
segment under it.

Efficiency: It is very efficient because when the database contains a large


number of one to many relationships and when the user requires large
number of transaction.

Disadvantages:
Implementation complexity Database management problem Lack of structural
Independence. Operational Anomalies
Network data Advantage
model.
In 1971, the Conference on Data s:
It is simple and easy to
Systems implement.
It can handle many relationships within the
Languages (CODASYL) formally defined the network organization.
models.
In this model, data is represented by a collection It has better data independence compared
of records and the relationships are to hierarchical model.
represented by links.
Each record is collection of fields,
which contains only one data value. A link is an Disadvantages:
association between two records. More complex system of database
In the network model, entities are organized in a structure Lack of structural dependence.
graph, in which some entities can be accessed
through several paths.
Relation Data Model.
The relation data model was developed by E.F Codd in 1970.
Unlike, hierarchical and network model, there are no physical links.
All data is maintained in the form of tables consisting of rows and columns.
Each row (record) represents an entity and a column (field) represents an attribute of the entity.
In this model, data is organized in two-dimensional tables called relations. The tables or relation are
related to each other.
Relational Model Concepts
1.Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2.Tables – In the Relational model the, relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3.Tuple – It is nothing but a single row of a table, which contains a single record.
4.Relation Schema: A relation schema represents the name of the relation with its
attributes.
5.Degree: The total number of attributes which in the relation is called the degree of
the relation.
6.Cardinality: Total number of rows present in the Table.
7.Column: The column represents the set of values for a specific attribute.
8.Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9.Relation key - Every row has one, two or multiple attributes, which is called relation
key.
10.Attribute domain – Every attribute has some pre-defined value and scope which is
known as attribute domain
E-R
diagram.
Entity: An entity is represented using Entity
rectangles.
Attribute: Attributes are represented by
means of eclipses Relatio Attribut
. n e
Relationship: Relationship is represented
using diamonds shaped box.
Three components of E-R model.

ER-Diagram is a visual representation of data that describes how data is related to each
other. Entity:
An Entity can be any object, place, person or class.
In E-R Diagram, an entity is represented using rectangles. Rectangles are named with the
entity set they represent. Attribute:
An Attribute describes a property or characteristic of an entity. Attributes are represented
by means of eclipses.

to its entity
Every eclipse represents one attribute and is directly connected angle).
For example, Roll_No, Name and Birth date can be attributes of(rect
a
student Relationship:
A relationship type is a meaningful association between entity types.
Relationship is represented using diamond shaped box.
There are three types of relationship that exist between
entities.
Binary Relationship Recursive Relationship Ternary Relationship

Binary Relationship: It means relation between two entities. This is


further
divided into three
1. One to One:
types.
This type of relationship is rarely seen in real
world.
Generalization:

In generalization, a number of entities are bought together


into
one generalized entity based on their similar characterstics. For
example, pigeon,sparrow,crow can all be generalized as Birds.

Specification :

Specification is the opposite of generalization.


In specialization , a group of entities is divided in to sub groups
based on their characterstics.
Take a group of person for example. Aperson has name , dob
gender etc.
Similarly , in a school database, persons can be specialized as
teacher, student, or a staff, based on what role they play in
school as entities.
DATABASE
KEYS
Types of Keys
A key is one of the attributes of a table to identify one or more
tuples/records of the table.
Primary key-A primary key uniquely identifies a tuple /record in a table. A
primary key cannot be duplicated for different records in a table.
Ex: Student_id, Bank_accno are examples for primary key
Candidate key-There may be more than one unique field in a table that
can be selected as primary key- All such fields that are unique for every
row of table are known as candidate keys.
Alternate keys- Those candidate keys that are not selected as primary
keys are known as alternate keys.
Foreign key- A field in a table that can be chosen as primary key of
another table is known as foreign key.
For ex: bank_accno in a student table that may have student_id as primary
key and bank_accno as the foreign key.
Composite key-A key that consists of two or more attributes to identify a
record in a table are known as composite keys.
Data warehouse
A data ware house is a repository of an organization's electronically stored data.
Data warehouse are designed to facilitate reporting and supporting data analysis.
The concept of data warehouses was introduced in late 1980’s.

Components of Data warehouse.


The components of data warehouse are:
o Data Source
o Data Transformation
o Reporting
o Metadata
Additional components are Dependent data marts, Logical Data marts, Operational Data
store.
DATA MINING

Data mining is concerned with the analysis and picking out relevant information.
E.F.Codd was a computer Scientist who invented Relational
model for database management.

Based on Relational model, Relational database was created.

Rule Zero:
This rule states that for a system to qualify as on RDBMS, it
must be able to manage database entirely through the
relational capabalities.
CODD’s Rule AND Normalization

Dr Edgar F. Codd, after his extensive research on the Relational Model of


database systems, came up with twelve rules of his own, which according to
him, a database must obey in order to be regarded as a true relational
database.
These rules can be applied on any database system that manages stored
data using only its relational capabilities. This is a foundation rule, which acts
as a base for all the other rules.

Rule 1: Information Rule


The data stored in a database, may it be user data or metadata, must be a
value of some table cell. Everything in a database must be stored in a table
format.

Rule 2: Guaranteed Access Rule


Every single data element (value) is guaranteed to be accessible
logically with a combination of table-name, primary-key (row value), and
attribute-name (column value). No other means, such as pointers, can be
used to access data.
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment. This is a very
important rule because a NULL can be interpreted as one the following − data is missing, data is
not known, or data is not applicable.

Rule 4: Active Online Catalog


The structure description of the entire database must be stored in an online catalog, known as
data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.

Rule 5: Comprehensive Data Sub-Language Rule


A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. This language can be used
directly or by means of some application. If the database allows access to data without any help of
this language, then it is considered as a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the
system.

Rule 7: High-Level Insert, Update, and Delete Rule


A database must support high-level insertion, updation, and deletion. This must not be limited to a
single row, that is, it must also support union, intersection and minus operations to yield sets of data
records.

Rule 8: Physical Data Independence


The data stored in a database must be independent of the applications that access the database.
Any change in the physical structure of a database must not have any impact on how the data is
being accessed by external applications.

Rule 9: Logical Data Independence


The logical data in a database must be independent of its user’s view (application). Any change in
logical data must not affect the applications using it. For example, if two tables are merged or one is
split into two different tables, there should be no impact or change on the user application. This is
one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity
constraints can be independently modified without the need of any change in
the application. This rule makes a database independent of the front-end
application and its interface.

Rule 11: Distribution Independence


The end-user must not be able to see that the data is distributed over various
locations. Users should always get the impression that the data is located at
one site only. This rule has been regarded as the foundation of distributed
database systems.

Rule 12: Non-Subversion Rule


If a system has an interface that provides access to low-level records, then the
interface must not be able to subvert the system and bypass security and
integrity constraints.
Normalization
•Normalization is the process of organizing the data in the
database.
Types of Normal Forms
•Normalization is used to minimize the redundancy from a There are the four types of normal forms:
relation or set of relations. It is also used to eliminate the
undesirable characteristics like Insertion, Update and
Deletion Anomalies.
•Normalization divides the larger table into the smaller
table and links them using relationship.
•The normal form is used to reduce redundancy from the
database table.
First Normal Form (1NF)
• A relation will be 1NF if it contains an atomic value.
• It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
• First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.EMPLOYEE
table:
The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
•In the 2NF, relational must be in 1NF.
•In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for
2NF. To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry
25 Biology
47 English
83 Math
83 Computer

TEACHER_ID TEACHER_AGE

25 30
47 35
83 38
Third Normal Form (3NF)
•A relation will be in 3NF if it is in 2NF and not contain any
transitive partial dependency.
•3NF is used to reduce the data duplication. It is also used
to achieve the data integrity.
•If there is no transitivedependency
for non-prime
attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of
the following conditions for every non-trivial function
dependency X → Y.
1.X is a super key.
2.Y is a prime attribute, i.e., each element of Y is part of
some candidate key. Example: EMPLOYEE_DETAIL
table:
Super key in the table above:
1.{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so o
n
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent
on EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table: EMPLOYEE_ZIP table:

Next
TopicDBMS BCNF

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010


333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMP_ZIP EMP_STATE EMP_CITY


EMP_ID EMP_NA EMP_ZIP EMP_ST EMP_CIT
201010 UP Noida ME ATE Y

02228 US Boston 222 Harry 201010 UP Noida


60007 US Chicago 333 Stephan 02228 US Boston
06389 UK Norwich 444 Lan 60007 US Chicago
462007 MP Bhopal 555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

You might also like