LM Dbmsnotes 2
LM Dbmsnotes 2
lead to unwanted duplicated files. This is wasteful since file duplication uses up
extra space on the hard drives which could otherwise be put to better use. Eg. The
address and telephone number of a particular customer may appear in a file
generated from order filling system and also generated from invoicing system.
This redundancy leads to higher storage & excess cost also leads to inconsistency.
2. Data Inconsistency
There are different copies of the same data, but the content does not match as it
should. This problem occurs when data is updated, but the some copies of the data
remain old or not updated. Eg. A changed customer address may be reflected in
customer master file generated by order filling system but not in customer master
file generated from invoicing system.
4. Data Isolation
Data are scattered in various files, and the files may be in different format, writing new
application program to retrieve data is difficult.
5. Integrity Problems
The data values may need to satisfy some integrity constraints. For example the phone
number has minimum 6 digits. We have to handle this through program code in file
processing systems. But in database we can declare the integrity constraints along with
definition itself.
6. Atomicity Problem
Atomicity means – it must happen in its entirely or not at all. It is difficult to ensure
atomicity in file processing system. For example, transferring Rs.100 from Account A to
account B. If a failure occurs during execution there could be situation like Rs.100 is
deducted from Account A and not credited in Account B resulting in an inconsistency
database state. It is essential to database that either both the credit and debit occur, or that
neither occurs. That is, the funds transfer must be atomic.
8. Security Problems
Enforcing Security Constraints in file processing system is very difficult as the
application programs are added to the system in an ad-hoc manner.
Data is “raw, unanalysed facts, figures and events” Data refers to the lowest abstract
or a raw input which when processed or arranged makes meaningful output.
Information is “useful knowledge derived from the data”. Information is usually the
processed outcome of data. More specifically speaking, it is derived from data.
For example:
Researchers who conduct market research survey might ask member of the public to
complete questionnaires about a product or a service. These completed questionnaires
are data; they are processed and analyze in order to prepare a report on the survey.
This resulting report is information.
1. Complexity
A database system creates additional complexity and requirements. The supply
and operation of database management system with several users and databases is
quite costly and demanding.
2. Qualified personnel
The professional operation of a database system requires appropriately trained
staff. Without a qualified database administrator nothing will work for long.
3. Cost of hardware/software
A processor with high speed of data processing and memory of large size is
required to run the DBMS software. It means that you have to upgrade the
hardware used for file-based system. Similarly, DBMS software is also very
costly.
4. Lower efficiency
A database system is a multi use software which is often less efficient than
specialized software which is produced and optimized exactly for one problem.
5. Danger of a overkill
For small and simple applications for a single user a database system is often not
advisable.
6. Database damage
In most of the organizations, all data is integrated into a single database. If
database is damaged due to electric failure or database is corrupted on the storage
media, then your valuable data may be lost forever.
7. Cost of data conversion
When a computer file-based system is replaced with a database system, the data
stored into data file must be converted to database file. It is very difficult and
costly method to convert data of data files into database. You have to hire
database and system designers along with application programmers. Alternatively,
you have to take the services of some software house. So a lot of money has to be
paid for developing software.
8. Complexity of backup and recovery
ACID Properties
ACID properties are an important concept for databases. The acronym stands for Atomicity,
Consistency, Isolation, and Durability.
The ACID properties of a DBMS allow safe sharing of data. Without these ACID properties,
everyday occurrences such using computer systems to buy products would be difficult and
the potential for inaccuracy would be huge. Imagine more than one person trying to buy the
same size and color of a sweater at the same time -- a regular occurrence. The ACID
properties make it possible for the merchant to keep these sweater purchasing transactions
from overlapping each other -- saving the merchant from erroneous inventory and account
balances.
Atomicity
The phrase "all or nothing" succinctly describes the first ACID property of atomicity.
When an update occurs to a database, either all or none of the update becomes
available to anyone beyond the user or application performing the update. This update
to the database is called a transaction and it either commits or aborts. This means that
only a fragment of the update cannot be placed into the database, should a problem
occur with either the hardware or the software involved.
Consistency
It states that only valid data will be written to the database. If, for some reason, a
transaction is executed that violates the database’s consistency rules, the entire
transaction will be rolled back and the database will be restored to a state consistent
with those rules. On the other hand, if a transaction successfully executes, it will take
the database from one state that is consistent with the rules to another state that is also
consistent with the rules.
Isolation
It requires that multiple transactions occurring at the same time not impact each
other’s execution or One transaction does not interfere with another. If two
transactions are happening on the same time using same resource, then the system
may not be consistent. So the Isolation process makes sure the transactions do not
interfere to each other and make the transactions individually. In case of only one
transaction, it does not matter.
Durability
A committed (Saved) transaction will not be lost. Durability is ensured through the
use of database backups and transaction logs that facilitate the restoration of
committed transactions in spite of any subsequent software or hardware failures.
A computer reveals the minimum of its internal details, when seen from outside. We
do not know what parts it is built with. This is the highest level of abstraction,
meaning very few details are visible. If we open the computer case and look inside at
the hard disc, motherboard, CD drive, CPU and RAM, we are in middle level of
abstraction. If we move on to open the hard disc and examine its tracks, sectors and
read-write heads, we are at the lowest level of abstraction, where no details are
invisible.
In the same manner, the database can also be viewed from different levels of
abstraction to reveal different levels of details. From a bottom-up manner, we may
find that there are three levels of abstraction or views in the database. We discuss
them here.
The word schema means arrangement – how we want to arrange things that we have
to store.
Physical level
Logical level
External View
The highest level of abstraction is the User View. This is targeted for the end
users. Now, an end user does not need to know everything about the structure of
the entire database, rather than the amount of details he/she needs to work with.
We may not want the end user to become confused with astounding amount of
details by allowing him/her to have a look at the entire database, or we may also
not allow this for the purpose of security, where sensitive information must
remain hidden from unwanted persons. The database administrator may want to
create custom made tables, keeping in mind the specific kind of need for each
user. These tables are also known as virtual tables, because they have no separate
physical existence. They are crated dynamically for the users at runtime. Say for
example, in our sample database we have created earlier, we have a special officer
whose responsibility is to keep in touch with the parents of any under aged student
living in the hostels. That officer does not need to know every detail except the
Roll, Name, Addresss and Age. The database administrator may create a virtual
table with only these four attributes, only for the use of this officer.
Data Independence
Logical data independence: The ability to change the logical (conceptual) schema
without changing the External schema (User View) is called logical data
independence. For example, the addition or removal of new entities, attributes, or
relationships to the conceptual schema should be possible without having to change
existing external schemas or having to rewrite existing application programs.
Physical data independence: The ability to change the physical schema without
changing the logical schema is called physical data independence. For example, a
change to the internal schema, such as using different file organization or storage
structures, storage devices, or indexing strategy, should be possible without having to
change the conceptual or external schemas.
View level data independence is always independent no affect, because there doesn't
exist any other level above view level.
Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the
database. The term instance is typically used to describe a complete database
environment, including the RDBMS software, table structure, stored procedures and
other functionality.
Database Languages
A database system provides a data definition language to specify the database
schema and a data manipulation language to express database queries and updates.
Procedural DMLs require a user to specify what data are needed and
how to get those data. This means that the user must express all the
data access operations that are to be used by calling appropriate
procedures to obtain the required information. Such a procedural DML
retrieves a record, processes it and based on the results obtained by this
processing, retrieves another record that would be process similarly
and so on. This process of retrievals continuous until the data requested
from the retrieval has been gathered. Typically, procedural DMLs are
embedded in high level programming language that contain construct
to facilitate iteration and handle navigation logic. Network and
Hierarchical databases use Procedural DML's.
Nonprocedural DMLs require a user to specify what data are needed
without specifying how to get those data. Non-procedural DMLs allow
A data dictionary is a file that contains metadata that is, data about data. This
file is consulted before actual data are read or modified in the database system.
The storage structure and access methods used by the database system are
specified by a set of definitions in a special type of DDL called a data storage
and definition language.
With this help a data scheme can be defined and also changed later.
Typical DDL operations (with their respective keywords in the structured
query language SQL):
Transaction Control (TCL) statements are used to manage the changes made
by DML statements. It allows statements to be grouped together into logical
transactions.
Database Users
There are four different types of database system users, differentiated by the way that
they expect to interact with the system. A primary goal of a database system is to
provide an environment for retrieving information from and storing new information
into the database.
Native Users
Naive users are unsophisticated users who interact with the system by
invoking one of the application programs that have been written previously. For
example, a bank teller who needs to transfer $50 from account A to account B
invokes a program called transfer. This program asks the teller for the amount of
money to be transferred, the account from which the money is to be transferred,
and the account to which the money is to be transferred. As another example,
consider a user who wishes to find her account balance over the World Wide
Web. Such a user may access a form, where she enters her account number. An
application program at the Web server then retrieves the account balance, using
the given account number, and passes this information back to the user. The
typical user interface for naive users is a forms interface, where the user can fill in
appropriate fields of the form. Naive users may also simply read reports generated
from the database.
Application Programmers
Application programmers are computer professionals who write application
programs and interact with the system through DML calls, which are embedded in
a program written in a host language (for example COBOL, C). Since the DML
syntax is different from the host language syntax, DML calls are usually prefaced
by a special character so that the appropriate code can be generated. A special
preprocessor, called the DML precompiler, converts the DML statements to
normal procedure calls in the host language. There are special types of
programming languages that combine control structures of Pascal like languages
with control structures for the manipulation of a database object.
Sophisticated Users
Sophisticated users interact with the system without writing programs. Instead,
they form their requests in a database query language. Each such query is
submitted to a query processor whose function is to break down DML statement
into instructions that the storage manager understands.
Specialized Users
Specialized users are sophisticated users who write specialized database
applications that do not fit into the traditional data-processing framework. Among
these applications are computer aided design systems, knowledge-base and expert
systems.
Database Administrator
Data base administrator is the person who control the data in the organization its main
function to control the information of the particular enterprise
Functions of a DBA
The functions performed by a DBA are the following:
Schema Definition
Data Manager
The data manager is the central software component of the DBMS. It is
sometimes referred to as the database control system. One of the functions of
the data manager is to convert operations in the user’s queries coming directly
via the query processor or indirectly via an application program from the
user’s logical view to a physical file system. The data manger is responsible
for interfacing with the file system. In addition, the tasks of enforcing
constraints to maintain the consistency and integrity of the data, as well as its
security, are also performed by the data manager. Synchronizing the
simultaneous operations performed by concurrent user is under the control of
the data manager. It is also entrusted with backup and recovery operations.
File Manager
Responsibility for the structure of the files and managing the file space rests
with the file manager. It is also responsible for locating the block containing
the required record, requesting this block from the disk manager, and
transmitting the required record to the data manger. The file manager can be
implemented using an interface to the existing file subsystem provided by the
operating system of the host of the host computer or it can include a file
subsystem written especially for the DBMS.
Disk Manager
The disk manager is part of the operating system of the host computer and all
physical input and output operating are performed by it. The disk manager
transfers the block of page requested by the file manager so that the latter need
not be concerned with the physical characteristics of the underlying storage
media.
Query Processor
The database user retrieves data by formulating a query in the data
manipulation language provided with the database. The query processor is
used to interpret the online user’s query and convert it into an efficient series
of operating in a form capable of being sent to the data manager for execution.
The query processor uses the data dictionary to find the structure of the
relevant portion of the database and uses this information in modifying the
query and preparing an optimal plan to access the database.
Data Files
Data files contain the data portion of the database.
Data Dictionary
Information pertaining to the structure and usage of data contained in the
database, the metadata, is maintained in a data dictionary. The term system
catalog also describes this metadata. The data dictionary, which is a database
itself, documents the data. Each database user can consult the data dictionary
to learn what each piece of data and various synonyms of the data field mean.
In an integrated system (i.e., in a system where the data dictionary is
part of the DBMS) the data dictionary stores information concerning the
external, conceptual and internal levels of the database. It contains the source
of each data-field value, the frequency of its use and an audit trail concerning
updates, including who and when of each update.
Currently data dictionary systems are available as add-ons to the
DBMS. Standards have yet to be evolved for integrating the data dictionary
facility with the DBMS so that the two databases, one for metadata and the
other for data, can be manipulated using an unified DDL/DML.
Access Aids
To improve the performance of a DBMS, a set of access aids in the form of
indexes are usually provided in a database system. Commands are provided to
build and destroy additional temporary indexes.
Data Models
A data model is a collection of concepts that can be used to describe the structure of a
database. The way in which information is subdivided and managed within a database
is referred to as the data model used by the DBMS. Each DBMS is based on a
particular data model. A data model is a description of both a container for data and
methodology for storing and retrieving data from container. A user must choose a
DBMS which is suitable for the project.
Data models can be classified into three major groups. They are:
These models are used to describe data at the logical and view levels. The data is
stored in two-dimensional tables (rows and columns). The data is manipulated based
on the relational theory. E-R model will cover in detail shortly.
This model is used to describe data at the logical and view levels. The database is
structured in fixed format records of different types. Each record type has a fixed
number of fields. And each field is of fixed length. The following are the three
important record based logical models.
Relational Model
Network Model
Hierarchical Model.
Relational Model:
Certain fields may be designated as keys, which means that searches for specific
values of that field will use indexing to speed them up. Where fields in two different
tables take values from the same set, a join operation can be performed to select
related records in the two tables by matching values in those fields. Often, but not
always, the fields will have the same name in both tables. For example, an "orders"
table might contain (customer-ID, product-code) pairs and a "products" table might
contain (product-code, price) pairs so to calculate a given customer's bill you would
sum the prices of all products ordered by that customer by joining on the product-code
fields of the two tables. This can be extended to joining multiple tables on multiple
fields. Because these relationships are only specified at retrieval time, relational
databases are classed as dynamic database management system. The RELATIONAL
database model is based on the Relational Algebra.
Network Model:
The popularity of the network data model coincided with the popularity of the
hierarchical data model. Some data were more naturally modeled with more than one
parent per child. So, the network model permitted the modeling of many-to-many
relationships in data. The basic data modeling construct in the network model is the
set construct. A set consists of an owner record type, a set name, and a member record
type. A member record type can have that role in more than one set, hence the
multiparent concept is supported. An owner record type can also be a member or
owner in another set. The data model is a simple network, and link and intersection
record types (called junction records by IDMS) may exist, as well as sets between
them. Thus, the complete network of relationships is represented by several pairwise
sets; in each set some (one) record type is owner (at the tail of the network arrow) and
one or more record types are members (at the head of the relationship arrow). Usually,
a set defines a 1:M relationship, although 1:1 is permitted.
end
Hierarchical Model:
The hierarchical data model organizes data in a tree structure. There is a hierarchy of
parent and child data segments. This structure implies that a record can have repeating
information, generally in the child data segments. To create links between these
record types, the hierarchical model uses Parent Child Relationships. These are a 1:N
mapping between record types. For example, an organization might store information
about an employee, such as name, employee number, department, salary. The
organization might also store information about an employee's children, such as name
and date of birth. The employee and children data forms a hierarchy, where the
employee data represents the parent segment and the children data represents the child
segment. If an employee has three children, then there would be three child segments
associated with one employee segment. In a hierarchical database the parent-child
relationship is one to many. This restricts a child segment to having only one parent
segment. Hierarchical DBMSs were popular from the late 1960s, with the
introduction of IBM's Information Management System (IMS) DBMS, through the
1970s.
A physical data model is a representation of a data design which takes into account
the facilities and constraints of a given database management system. In the lifecycle
of a project it is typically derived from a logical data model. A physical database
model shows all table structures, including column name, column data type,
constraints definition, indexes, table partitioning, linking tables etc. The physical data
model can usually be used to calculate storage estimates and may include specific
storage allocation details for a given database system.
Entity Set : It is a collection of all entities of particular entity type in the database.
Attributes of an entity are possessed by all member of entity set.
For example
A company have many employees ,and these employees are defined as entities(e1 or
e2 or e3 or ....) and all these entities having same attributes are defined under
ENTITY TYPE employee and set{e1,e2,.....} is called entity set.
fruit=entity type=Employee
Domain
For each attribute there is a set of permitted values called the domain of that attribute. For
example, attribute RollNo have permitted values are 1,2,3,4…. And 09BCA01, 09BCA02….
( 09 is joining year of BCA and 01 is rollno). So all this numbers are consider as the domain
of attribute Rollno.
Types of Attributes
Simple attribute : Simple attribute that consist of a single atomic value (value that
can not be divided). It means simple attribute can not be subdivided. For example the
attributes age, sex etc are simple attributes.
Composite attribute : A composite attribute is an attribute that can be further
subdivided. For example the attribute ADDRESS can be subdivided into street, city,
state, and zip code.
Single Value attribute : A single valued attribute can have only a single value. For
example a person can have only one 'date of birth', 'age' etc. That is a single valued
attributes can have only single value. But it can be simple or composite attribute.That
is 'date of birth' is a composite attribute , 'age' is a simple attribute. But both are single
valued attributes.
Multi Valued attribute : Multivalued attributes can have multiple values. For
instance a person may have multiple phone numbers,multiple degrees etc.
Stored and Derived Attributes : The value for the derived attribute is derived from
the stored attribute. For example 'Date of birth' of a person is a stored attribute. The
value for the attribute 'AGE' can be derived by subtracting the 'Date of Birth'(DOB)
from the current date. Stored attribute supplies a value to the related attribute.
Complex Attribute : A complex attribute that is both composite and multi valued.
The CREATE TABLE command defines each column of the table uniquely. Each column
has a minimum of three attributes, a name, datatype and size (i.e. column width). Each table
column definition is separated from the other by a comma. In table name you can use
alphabets from A-Z, a-z, numbers from 0-9 and special character like _ (no other special
characters are allowed). The condition is that the table name must be start with an alphabet.
And SQL reserved words not allowed to use like create, select, and so on.
Syntax
Example
Execution of the above DDL statement creates the table Student_Info, which contain
the information of the students like their RollNo, Name and address. In this table we can
enter maximum 20 characters in name and 50 character in address. In addition, it updates the
data dictionary.
Once a table is created, the most natural thing to do is fill table with data to be manipulated
later. When inserting a single row of data into the table, the insert operation creates a new
row (empty) in the database table and then fills the values passed by the SQL insert into the
columns specified.
Note : character expressions placed within the INSERT INTO statement must be enclosed in
quotes (single or double)
Syntax
Example
In the insert into SQL sentence, table columns and values have a one to one relationship that
is, the first value described in inserted into the first column, and the second value described is
inserted into the second column and so on.
Hence, in an insert into SQL sentence if there are exactly the same numbers of values as there
are columns and the values are sequenced in exactly in accordance with the data type of the
table columns, there is no need to indicate the column names.
But in two situations, it is compulsory to give the column names.
There are less values being described than there are columns in the table, must specify
the name of the columns and its corresponding values which fill in that columns.
If sequence of the column in the table is not known, then specify the columns name
along with the values.
Example
If we know the sequence of the column then
INSERT INTO Student_Info
VALUES(2,”Ami”,”Mumbai”)
If we do not know the sequence of the column then
INSERT INTO Student_Info(Address, Name, RollNo)
VALUES(“Baroda”,”Mehul”,3)
If the RollNo column has auto Number then
INSERT INTO Student_Info(Name,Address)
VALUES(“Mukesh”,”Surat”)
Keys Constraints
A key is a single or combination of multiple fields. Its purpose is to access or retrieve data
rows from table according to the requirement. The keys are defined in tables to access or
sequence the stored data quickly and smoothly. They are also used to create links between
different tables.
Super Key
A combination of one or more columns in a table which can be used to identify a
record in a table uniquely, a table can have any number of super keys. For example,
Employee(Empl_ID, Name, Address, Salary, Department_ID)
Super keys
1 Empl_ID
2 Empl_ID, Name
3 Empl_ID, Address
4 Empl_ID, Department_ID
5 Empl_ID, Salary
6 Name, Address
7 Name, Address, Department_ID ………… So on
as any combination which can identify the records uniquely will be a Super Key.
Candidate Key
A Column (or) Combination of columns which can help uniquely identify a record in
a table without the need of any external data is called a Candidate Key. Depending on
the need and situation a Table may have one or more candidate keys and one of them
can be used as a Primary Key of the table.
Here, Empl_ID, Name, Address are prime attributes. Other attributes that do not part
of candidate key are non-prime attributes. A super key may contain extraneous
attributes. It is possible to retrieve unique record through the {Empl_ID} and {Name,
Address}. These both are enough for it. So both these keys are candidate key.
Depending on the need and situation a Table may have one or more candidate keys
and one of them can be used as a Primary Key of the table.
Primary Key
A Candidate Key that is used by the database designer for unique identification of
each row in a table and has the Constraint NOT NULL attached to it is known as a
Primary Key. It can either be part of the actual record itself, or it can be an artificial
field (one that has nothing to do with the actual record). A primary key can consist of
one or more fields on a table. When multiple fields are used as a primary key, they are
called a composite key.
In above example if designer can select {Empl_ID} as primary key. And if designer is
select {Name, Address} as primary key it become composite key or composite
primary key.
Syntax
Example
Syntax
PRIMARY KEY(ColumnName, ColumnName)
Example
Alternate Key
We cannot define the Alternate Key Separately from a Candidate Key, for a table, if
there are two or more Candidate Key’s and one is chosen as a Primary Key the others
Candidate Keys are known as the Alternate Key of that table. Here if designer select
{Emp_ID} is choose as primary key then {Name, Address} is become alternate key.
Unique Key
A column (or) combination of columns which can be used to uniquely identify a
record in a table, it can have NULL Value. It is same as primary key but there are
certain differences between primary key and unique key. The differences are
1. A table may have more than one unique key. This depends on the design of the
table. But, a table can have only one primary key.
2. The values in the unique key columns may or may not be NULL. The values
in primary key columns, however, cannot be NULL.
Syntax
ColumnName Datatype(size) UNIQUE
Example
Syntax
UNIQUE(ColumnName, ColumnName)
Example
Secondary key
The attributes that are not even the Super Key but can be still used for identification
of records (not unique) are known as Secondary Key.
For example, Department_ID does not contain unique information because more than
one employee works in one Department. But this Department_ID field is to know the
employee who belongs to the particular department like to know the employee list
who works for account department.
Foreign key
Foreign keys represent relationship between tables. A foreign key is a column or a
group of columns whose values are derived from the primary key or unique key of
some other table or same table. The table in which the foreign key is defined is called
a foreign table or detail table or child table. The table that defines the primary or
unique key and is referenced by the foreign key is called the primary table or master
table or parent table. For example,
DEPARTMENTS
DeptCode DeptName
AC Accounting
IS Information Systems
MK Marketing
RI Receiving & Inventory
SL Sales
EMPLOYEE
EmployeeID LastName FirstName DeptCode
EN1-10 Schaaf Carol SL
EN1-12 Murray Gayle AC
EN1-15 Baranco Steve MK
EN1-16 Racich Kristine RI
EN1-19 Zumbo Barbara IS
EN1-20 Gordon Daniel SL
EN1-22 Rivet Jacqueline MK
EN1-23 Rosyln Betsy RI
EN1-25 Strick Will IS
EN1-26 Shipe Susan MK
EN1-27 Fink Joseph SL
EN1-28 Rubinstein Sara AC
EN1-30 Coleman Michael RI
In above example, Department table is master table and employee table is detail table.
Here DeptCode (attribute) of Department table is used as foreign key in Employee
table.
The discriminator (or partial key) of a weak entity set is the set of attributes that
distinguishes among all the entities of a weak entity set
The existence of a weak entity set depends on the existence of a identifying entity set.
It must relate to the identifying entity set via a total, one-to-many relationship set
from the identifying to the weak entity set
Partial Key
An attribute is a Partial Key if a Key from a related entity type must be used in
conjunction with the attribute in question to uniquely identify instances of a
corresponding entity set. Specifies a key that, that is only partially unique. Used for
weak entities.
For example, suppose "Course Number" is an attribute of the Course entity type in our
design for a University database. Suppose Course Number alone cannot uniquely
identify courses. Rather, to identify a course we must include the Department Number
attribute as well. Course Number is a partial key.
Although a weak entity set does not have a primary key, we nevertheless need
a means of distinguishing among all those entities in the weak entity set that depend
on one particular strong entity. The discriminator or partial key of a weak entity set
is a set of attributes that allows this distinction to be made. For example, the
discriminator of the weak entity set payment is the attribute payment-number, since,
for each loan, a payment number uniquely identifies one single payment for that loan.
There are so many types of constraints. We have already seen the primary key constraint,
unique key constraint and foreign key constraint. Now see some other constraints.
Domain Constraint
A domain of possible values must be associated with every attribute. Declaring an
attribute to be of a particular domain acts as a constraint on the values that it can take.
For example of above RollNo, if we specify RollNo as integer then now the domain
for the attributes are only 1,2,3,…. so on. (09BCA01, 09BCA02… are not now
considers in the domain of RollNo). It is a domain constraint. Domain constraints are
the most elementary form of integrity constraint. They are tested easily by the system
whenever a new data item is entered into the database.
Now column CID and Name cannot contain NULL value while Address can contain
NULL value.
It will result in an error because this will lead to column "CID" being NULL, which
violates the NOT NULL constraint on that column.
Default Constraint
The default constraint provides a default value to a column when insert into statement
does not provide a specific value. For example,
Even though we didn't specify a value for the "Marks" column in the INSERT INTO
statement, it does get assigned the default value of 20 since we had already set 20 as
the default value for this column.
Entity Integrity
Let us look at the effect of null values in prime attributes. A null value for an attribute
is a value that is either not known at the time or does not apply to a given instance of
the object. It may also be possible that a particular tuple does not have a value for an
attribute. This fact could be represented by a null value.
If any attribute of a primary key were permitted to have null values, then,
because the attributes in the key must be non-redundant, the key cannot be used for
unique identification of tuples. This contradicts the requirements for a primary key.
(a) Relation R without null values (b) Relation S with null values
Id Name Id Name
101 Jones 101 Jones
103 Smith @ Smith
104 Lalonde 104 Lalonde
107 Evan 107 Evan
110 Drew 110 Drew
112 Smith @ Lalonde
@ Smith
Consider the relation S. The attribute id is the primary key. If null values, that
is represented as @, were permitted, then the two tuples <@,Smith> are
indistinguishable, even though they may represent two different instances of the enity
type employee. Similarly, the tupes <@,Lalonde> and <104, Lalonde> for all intents
and purposes, are also indistinguishable, so must be their surrogates in the model. So
the instances of the entities are distinguishable and thus no prime attribute value may
be null. This rule is referred to as the entity rule. If attribute A of relation R(R) is
prime attribute of R(R), then A cannot accept null values.
Referential Integrity
Relation (R) may contain references to another relation (S). Relation R and S need not
be distinct. Suppose the reference in R is via a set to attributes that forms a primary
key of the relation S. This set of attributes in R is a foreign key. A valid relationship
between a tuple in R to one in S requires that the values of the attributes in the foreign
key of R correspond to the primary key of a tuple in S. This ensures that the reference
from a tuple of the relation R is made unambiguously to an existing tuple in the S
relation. The referencing attribute in the R relation can have null value; in this case, it
is not referencing any tuple in the S relation. However, if the value is not null, it must
exist as the primary attribute of a tuple of the S relation. If the referencing attribute in
R has a value that is nonexistent in S, R is attempting to refer a nonexistent tuple and
hence a nonexistent instance of the corresponding entity.
For example, consider the employees and their manager. Each employee has a
manager and as managers are also employees, we may represent managers by their
employee numbers, if the employee number is a key of the relation employee. The
manager is a foreign key that it is referring to the primary key of the same relation. An
employee can only have a manager who is also an employee. The CEO of the
company can have himself or herself as the manager or may take null values. Some
employees may also be temporarily without managers, and this can be represented by
the manager taking null values.
So, referential integrity is given two relations R and S, suppose R refers to the relation
S via a set of attributes that forms the primary key of S and this set of attributes forms
a foreign key in R. Then the value of the foreign key in a tuple in R must either be
equal to the primary key of a tuple of S or be entirely null.
If we delete a tuple that is target of a foreign key reference, then three explicit
possibilities exist to maintain database integrity.
1) All tuples that contain references to the deleted tuple should also be deleted.
This may cause, in turn, the deletion of other tuples. This option is referred to
as a domino or cascading deletion, since one deletion leads to another.
2) Only tuples that are not referenced by any other tuple can be deleted. A tuple
referred by other tuples in the database cannot be deleted.
3) The tuple is deleted. However, to avoid the domino effect, the pertinent
foreign key attributes of all referencing tuples are set to null.
The choice of the option to use during a tuple deletion depends on the application.
Relationship Set
If a relationship type is between entities in a single entity type then it is called a unary
relationship type. Employee works and repots to the manager who is also a employee, so
employee reports to the other employee which involves only one entity set so it is unary
relationship set. The Degree of relationship is 1.
Employee
manager managed
Reports_
to
It is possible to model relationship types involving more than two entity types. This
relationship type is said to be a ternary relationship type since three entity types are
involved. After check the patient doctor prescribe medicines to that patient according to the
diagnosis. So here three entities are involved Doctor, Patient and Medicine so it is ternary
relationship set. The Degree of relationship is 3.
Medicine
Mapping Cardinality
Mapping cardinality is also known as cardinality ratio. It expresses the number of entities to
which another entity can be associated via a relationship set. Mapping Cardinalities are useful
in describing the binary relationship sets. For a binary relationship between two entity sets X
and Y we have the following mapping cardinalities.
One-to-one mapping:
An entity in X is associated with at least one entity in Y and an entity in Y is associated with
at least one entity in X. Example: husband - wife. For each entity there is one and only one
matching entity.
One-to-many mapping:
An entity in X is associated with any number (Zero or more) of entities in Y and an entity in
Y is associated with at least one entity in X. Example: Father and child. Father may have
more than one child.
Many-to-one mapping:
An entity in X is associated with at least one entity in Y. An entity in y can be associated with
any number (zero or more) of entities in X. Example: vendor and goods/product. One product
may have more than one vendor.
Many-to-many mapping:
An entity in X is associated with any number entities in y and an entity in y is associated with
any number of entities in X. Example: vender and client. A vender have more than one client
and a client have more than one vender.
An E-R diagram can express the overall logical structure of a database graphically.
Joins
An SQL join clause combines records from two or more tables in a database. It creates a set
that can be saved as a table or used as is. A JOIN is a means for combining fields from two
tables by using values common to each. ANSI standard SQL specifies four types of JOINs:
INNER, OUTER, LEFT, and RIGHT. As a special case, a table (base table, view, or joined
table) can JOIN to itself in a self-join.
Customers:
Sales:
2 5/6/2004 $100.22
1 5/7/2004 $99.95
3 5/7/2004 $122.95
3 5/13/2004 $100.00
4 5/22/2004 $555.55
As you can see those 2 tables have common field called CustomerID and thanks to that we
can extract information from both tables by matching their CustomerID columns.
The SQL expression above will select all distinct customers (their first and last names) and
the total respective amount of dollars they have spent.
The SQL JOIN condition has been specified after the SQL WHERE clause and says that the
2 tables have to be matched by their respective CustomerID columns.
The SQL statement above can be re-written using the SQL JOIN clause like this:
There are 2 types of SQL JOINS – INNER JOINS and OUTER JOINS. If you don't put
INNER or OUTER keywords in front of the SQL JOIN keyword, then INNER JOIN is
used. In short "INNER JOIN" = "JOIN" (note that different databases have different syntax
for their JOIN clauses).
The INNER JOIN will select all rows from both tables as long as there is a match between
the columns we are matching on. In case we have a customer in the Customers table, which
still hasn't made any orders (there are no entries for this customer in the Sales table), this
customer will not be listed in the result of our SQL query above.
2 5/6/2004 $100.22
1 5/6/2004 $99.95
Even though Paula and James are listed as customers in the Customers table they won't be
displayed because they haven't purchased anything yet.
But what if you want to display all the customers and their sales, no matter if they have
ordered something or not? We’ll do that with the help of SQL OUTER JOIN clause.
The second type of SQL JOIN is called SQL OUTER JOIN and it has 2 sub-types called
LEFT OUTER JOIN and RIGHT OUTER JOIN.
The LEFT OUTER JOIN or simply LEFT JOIN (you can omit the OUTER keyword in
most databases), selects all the rows from the first table listed after the FROM clause, no
matter if they have matches in the second table.
2 5/6/2004 $100.22
1 5/6/2004 $99.95
As you can see we have selected everything from the Customers (first table). For all rows
from Customers, which don’t have a match in the Sales (second table), the SalesPerCustomer
column has amount NULL (NULL means a column contains nothing).
The RIGHT OUTER JOIN or just RIGHT JOIN behaves exactly as SQL LEFT JOIN,
except that it returns all rows from the second table (the right table in our SQL JOIN
statement).
Functiona Dependency
For example, if A and B are attributes of a table, B is functionally dependent on A, if each
value of A is associated with exactly one value of B (so, you can say, 'A functionally
determines B').
Attribute or group of attributes on the left hand side of the arrow of a functional dependency
is refered to as 'determinant'
Simple example would be StaffID functionally determines Position in the above tables.
StaffID---->BranchID
BranchID----->Br_Address
So, StaffID attribute functionally determines Br_Address via BranchID attribute.
Non-trivial FDs are, of course, those which are not by definition trivial.
1. We need to consider all functional dependencies that hold. Given a set F of functional
dependencies, we can prove that certain other ones also hold. We say these ones are
logically implied by F.
2. Suppose we are given a relation scheme R = (A,B,C,G,H,I), and the set of functional
dependencies:
A→B
A→C
CG → H
CG → I
B→H
t1[A] = t2[A]
Thus, whenever two tuples have the same value on A, they must also have the same
value on H, and we can say that A → H .
6. To compute F+, we can use some rules of inference called Armstrong's Axioms:
7. These rules are sound because they do not generate any incorrect functional
dependencies. They are also complete as they generate all of F+.
8. To make life easier we can use some additional rules, derivable from Armstrong's
Axioms:
9. Applying these rules to the scheme and set F mentioned above, we can derive the
following:
Note that no two buildings on any of the university campuses have the same name, thus
ROOM/BLDG CAMPUS. As the determinant is not a candidate key this table is NOT in
Boyce-Codd normal form.
R2(room/bldg, campus)
Normalization Drawbacks
Student(Id,Name)
Transcript(StudId,CrsCode,Sem,Grade)
Normalization Example
Grade_report(StudNo,StudName,Major,Adviser,CourseNo,Ctitle,InstrucName,InstructLocn,
Grade)
Functional dependencies
Unnormalised
Grade_report(StudNo,StudName,Major,Advisor,CourseNo,Ctitle,InstrucName,InstructLocn,
Grade)
BCNF