Database Concepts Notes
Database Concepts Notes
DATABASE
CONCEPTS
DATABASE
A Database is a collection of logically related data
organized in a way that data can be easily accessed,
managed and updated.
DATA
INFORMATION
•Information is processed data, stored, or transmitted by a
computer.
APPLICATIONS OF DATABASE.
1 The volume of data, which can be The volume of data, which can be
processed, is limited. processed is large
2 Requires large quantity of paper Requires less quantity of paper
3 Speed and accuracy is executed is limited Faster and Accurate
4 Labour cost is high Labour cost is low
5 Storage medium is paper. Storage medium is Hard disk etc.
DATA PROCESSING
CYCLE.
DATA PROCESSING CYCLE.
Data Collection: It is the process of systematic gathering of data from various
sources that has been systematically observed, recorded and organized.
Data Input: The raw data is put into the computer using a keyboard, mouse or
other
devices such as the scanner, microphone and the digital camera.
Output: The result obtained after processing the data must be presented
to the user in user understandable form. The output can be generated in the
form of report as hard copy or soft copy.
FIELD
RECORD/TUPLE
A single entry in a table is called a record or row.
A record in a table represents set of related
data.
Records are also called the tuple.
DATABASE
TERMS
ENTITY
An Entity can be any object, place, person or class.
In E-R Diagram, an entity is represented using rectangles.
INSTANCE
The collection of information stored in the
database at a particular moment is called an
instance of the database.
ATTRIBUTE/FIELD
DOMAIN
It is defined as a set of allowed values for one or
more attributes.
TABLE
A table is a collection of data elements organized
in terms of rows and columns. Table is the
simplest form of data storage.
KEY
It is a column or columns which identifies the each
row or tuple.
DATA TYPES OF DBMS
• Integer
• Characters
• Strings
• Date fields
• Text fields
System Analysts
System analysts determine the requirement of end users; (especially naïve users),
to create a solution for their business need and focus on non-technical and technical
aspects.
Application programmers
These are the computer professionals who implement the specifications given by the
system analysts and develop the application programs.
• Enforcing Data Integrity: Data Integrity refers to the correctness of the data in the database.
In other words, the data available in the database is reliable data.
• Data Sharing: In DBMS, data is stored in the centralized database and all the permitted users
can access the same piece of information required at the same time.
• Database Security: DBMS provides a variety of security mechanisms for the user to protect
his or her
data stored in the database.
• Supports Concurrent access: DBMS supports concurrent access to the same data stored in
the database by applying locking and time stamp mechanisms.
• Multiple user interfaces: In order to meet the needs of various users having different
technicial knowledge.DBMS provides different types of interfaces such as query languages,
application program interfaces, and graphical user interfaces.
• Backup and Recovery : This RDBMS provides backup and recovery subsystems that is
DATA
ABSTRACTION
A major purpose of. a database system is to provide users with an abstract view of
the data. That is the system hides certain details of how the data are stored
and maintained.
It also contains the method of deriving the objects in the conceptual view from the
objects in the internal view.
The capacity to change data at one layer does not affect the data at another layer is called data independence.
It is the capacity to change the internal level without having to change either the schemas at the conceptual or
external level.
Changes to the internal schema may be needed because some physical files had to be reorganized.
Physical data independence refers to the data insulation of an application from the physical storage structure
only, it is easier to achieve than logical data independence.
The physical data independence are:
o File Organization
o Database Architecture
o Database Models
DIFFERENCE BETWEEN SERIAL AND DIRECT ACCESS
FILE ORGANIZATION.
Advantages
o Search time is less.
o There are fewer index entries than there are records in the data file.
o Quick access to the records even when the volume of records is high.
Disadvantages
o Additional file (index file) has to be created.
o Wastage of storage space by creating and maintaining the index file.
o Always indirect retrieval of data because first search begins in the index
files
then moves to the data file (No direct retrieval).
DBMS
ARCHITECTURE.
DBMS is the only entity where user directly sits on DBMS and uses
it.
It does not provide handy tools for end users and preferably
database designers and programmers use single tier architecture.
TWO-TIER CLIENT / SERVER ARCHITECTURE:
Advantages:
Simplicity: The relationship between the various layers is logically simple. Data
Security: The data security is provided by the DBMS.
Data Integrity: There is always link between the parent segment and the child
segment under it.
Disadvantages:
Implementation complexity Database management problem Lack of structural
Independence. Operational Anomalies
Network data Advantage
model.
In 1971, the Conference on Data s:
It is simple and easy to
Systems implement.
It can handle many relationships within the
Languages (CODASYL) formally defined the network organization.
models.
In this model, data is represented by a collection It has better data independence compared
of records and the relationships are to hierarchical model.
represented by links.
Each record is collection of fields,
which contains only one data value. A link is an Disadvantages:
association between two records. More complex system of database
In the network model, entities are organized in a structure Lack of structural dependence.
graph, in which some entities can be accessed
through several paths.
Relation Data Model.
The relation data model was developed by E.F Codd in 1970.
Unlike, hierarchical and network model, there are no physical links.
All data is maintained in the form of tables consisting of rows and columns.
Each row (record) represents an entity and a column (field) represents an attribute of the entity.
In this model, data is organized in two-dimensional tables called relations. The tables or relation are
related to each other.
Relational Model Concepts
1.Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2.Tables – In the Relational model the, relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3.Tuple – It is nothing but a single row of a table, which contains a single record.
4.Relation Schema: A relation schema represents the name of the relation with its
attributes.
5.Degree: The total number of attributes which in the relation is called the degree of
the relation.
6.Cardinality: Total number of rows present in the Table.
7.Column: The column represents the set of values for a specific attribute.
8.Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9.Relation key - Every row has one, two or multiple attributes, which is called relation
key.
10.Attribute domain – Every attribute has some pre-defined value and scope which is
known as attribute domain
E-R
diagram.
Entity: An entity is represented using Entity
rectangles.
Attribute: Attributes are represented by
means of eclipses Relatio Attribut
. n e
Relationship: Relationship is represented
using diamonds shaped box.
Three components of E-R model.
ER-Diagram is a visual representation of data that describes how data is related to each
other. Entity:
An Entity can be any object, place, person or class.
In E-R Diagram, an entity is represented using rectangles. Rectangles are named with the
entity set they represent. Attribute:
An Attribute describes a property or characteristic of an entity. Attributes are represented
by means of eclipses.
to its entity
Every eclipse represents one attribute and is directly connected angle).
For example, Roll_No, Name and Birth date can be attributes of(rect
a
student Relationship:
A relationship type is a meaningful association between entity types.
Relationship is represented using diamond shaped box.
There are three types of relationship that exist between
entities.
Binary Relationship Recursive Relationship Ternary Relationship
Specification :
Data mining is concerned with the analysis and picking out relevant information.
E.F.Codd was a computer Scientist who invented Relational
model for database management.
Rule Zero:
This rule states that for a system to qualify as on RDBMS, it
must be able to manage database entirely through the
relational capabalities.
CODD’s Rule AND Normalization
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
•In the 2NF, relational must be in 1NF.
•In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for
2NF. To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
Third Normal Form (3NF)
•A relation will be in 3NF if it is in 2NF and not contain any
transitive partial dependency.
•3NF is used to reduce the data duplication. It is also used
to achieve the data integrity.
•If there is no transitivedependency
for non-prime
attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of
the following conditions for every non-trivial function
dependency X → Y.
1.X is a super key.
2.Y is a prime attribute, i.e., each element of Y is part of
some candidate key. Example: EMPLOYEE_DETAIL
table:
Super key in the table above:
1.{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so o
n
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent
on EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table: EMPLOYEE_ZIP table:
Next
TopicDBMS BCNF