0% found this document useful (0 votes)
7 views

database class12

The document provides an overview of database concepts, defining data, information, and databases, along with their applications and processing cycles. It outlines the differences between manual and computerized data processing, the steps involved in the data processing cycle, and various database terms and features. Additionally, it discusses data independence, file organization methods, and different types of database architectures.

Uploaded by

kushal161107
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

database class12

The document provides an overview of database concepts, defining data, information, and databases, along with their applications and processing cycles. It outlines the differences between manual and computerized data processing, the steps involved in the data processing cycle, and various database terms and features. Additionally, it discusses data independence, file organization methods, and different types of database architectures.

Uploaded by

kushal161107
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

1

Chapter 13

DATABASE CONCEPTS
What is a data?
Data is a collection of facts, figures, statistics, which can be processed to produce meaningful information.
(OR)

Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random
and useless until it is organized.

What is information?

When data is processed, organized, structured or presented in a given context so as to make it useful, it is called
information.

(OR)

Information is data that has been processed in such a way as to be meaningful to the person who receives it.
(OR)

Information is the processed data on which decisions and actions are based.

Example:

1. The history of temperature readings all over the world for the past 100 years is data. If this data is
organized and analyzed to find that global temperature is rising, then that is information.
2. Each student's test score is one piece of data. The average score of a class or of the entire school is
information that can be derived from the given data.

What is a database?

A database is a collection of information that is organized so that it can easily be accessed, managed, and
updated.

Examples:

1. College database

2. Library database

3. Phone book database etc…

Applications of database:

Some of the applications of database are as follows

1. Banking

2. Water meter billing

3. Rail and airlines

4. Colleges

5. Credit card transactions


2

6. Telecommunications

7. Finance

8. Sales

9. Manufacturing

10. Military

11. Medical

Difference between manual data processing and computerized data processing

Manual data processing Computerized data processing


1. The volume of the data, which can be 1. The volume of the data, which can be
processed is limited in a desirable time processed can be very large
2. Need of large quantity of papers 2. Less amount of paper is needed
3. The speed and accuracy is less 3. Speed and accuracy is more
4. Storage medium is paper 4. Storage medium is secondary storage
5. Labor cost is high medium
5. Labor cost is economical

Data processing cycle:


What is Data Processing?

Data processing is simply the conversion of raw data to meaningful information through a process. Data is
manipulated to produce results that lead to a resolution of a problem or improvement of an existing situation.

The data processing cycle consists of six specific steps:


3

1. Collection of data
2. Preparation of data
3. Input of data
4. Processing of data
5. Output
6. Storage

1) Collection is the first stage of the cycle, and is very crucial, since the quality of data collected will impact heavily

on the output. The collection process needs to ensure that the data gathered are both defined and accurate, so

that subsequent decisions based on the findings are valid. This stage provides both the baseline from which to

measure, and a target on what to improve.

Some types of data collection include census (data collection about everything in a group or statistical

population), sample survey (collection method that includes only part of the total population), and administrative

by-product (data collection is a byproduct of an organization’s day-to-day operations).

2) Preparation is the manipulation of data into a form suitable for further analysis and processing. Raw data

cannot be processed and must be checked for accuracy. Preparation is about constructing a dataset from one or

more data sources to be used for further exploration and processing. Analyzing data that has not been carefully

screened for problems can produce highly misleading results that are heavily dependent on the quality of data

prepared.

3) Input is the task where verified data is coded or converted into machine readable form so that it can be

processed through a computer. Data entry is done through the use of a keyboard, digitizer, scanner, or data entry

from an existing source. This time-consuming process requires speed and accuracy. Most data need to follow a

formal and strict syntax since a great deal of processing power is required to breakdown the complex data at this

stage. Due to the costs, many businesses are resorting to outsource this stage.

4) Processing is when the data is subjected to various means and methods of manipulation, the point where

a computer program is being executed, and it contains the program code and its current activity. The process may

be made up of multiple threads of execution that simultaneously execute instructions, depending on the operating

system. A process is the actual execution of those instructions.

5) Output is the stage where processed information is now transmitted to the user. Output is presented to users in

various report formats like printed report, audio, video, or on monitor. Output need to be interpreted so that it

can provide meaningful information that will guide future decisions of the company.
4

6) Storage is the last stage in the data processing cycle, where data, instruction and information are held for future

use. The importance of this cycle is that it allows quick access and retrieval of the processed information, allowing

it to be passed on to the next stage directly, when needed. Every computer uses storage to hold system and

application software.

Database terms:

File:
A file is a large collection of related data.
Tables:
A table is a collection of data elements organized in terms of rows and columns
Records:
A single entry in a table is called record.
Tuple:
Records are also called the tuple
Fields/attribute:
Each column is identified by a distinct header called attribute or field.
Domain:
Set of values for an attribute in that column.
An Entity:
An entity can be any object, place, person or class.

Attributes/fields

Emp-ID NAME AGE SALARY

1 PRAMOD 26 45000
Row/tuple/
record 2 NAVEEN 30 36000

3 RAJU 24 55000

Domain
5

Data types of DBMS:

DBMS- Data Base Management System:

Database Management System or DBMS in short, refers to the technology of storing and retrieving user’s data
with utmost efficiency along with safety and security features. DBMS allows its users to create their own
databases which are relevant with the nature of work they want.

The primary goal of a DBMS is to provide away to store and retrieve database information that is both
convenient and efficient.
6

Features of database system:

Some of the features of database system are

1. Minimized Redundancy
2. Enforcing data integrity
3. Data sharing
4. Ease of application development
5. Data security
6. Multiple user interfaces
7. Backup and recovery

MinimizedRedundancy:

Data redundancy in database means that some data fields are repeated in the database.

This data repetition may occur either if a field is repeated in two or more tables or if the field is repeated within
the table.

Data can appear multiple times in a database for a variety of reasons. For example, a shop may have the same
customer’s name appearing several times if that customer has bought several different products at different dates.

Disadvantages of data redundancy


1. Increases the size of the database unnecessarily.
2. Causes data inconsistency.
3. Decreases efficiency of database.
4. May cause data corruption.

Such data redundancy in DBMS can be prevented by database normalization.

Enforcing data integrity:

Data integrity refers to the overall completeness, accuracy and consistency of data. The integrity of the stored
data can be lost in different ways

 Human errors when data is entered


 Errors that occur when data is transmitted from one computer to another
 Software bugs or viruses
 Hardware malfunctions, such as disk crashes

Data sharing:

Data sharing is a primary feature of a database management system (DBMS).The DBMS helps create an
environment in which end users have better access to more and better-managed data.

Ease of application development:

It helps to develop the application programs according to the user’s needs.

Data security:

Data security is the protection of the database from unauthorized users. When number of user’s increases to
access the data, the risk of data security increases, but the DBMS provides a framework for better enforcement of
data privacy and security policies. Only the authorized persons are allowed to access the database.
7

Multiple user interfaces:

In order to meet the needs of various users having different technical knowledge,DBMS provides different types of
interfaces such as query language, application program interface and graphical user interface.

Backup and recovery:

Most of the DBMSs provide the 'backup and recovery' sub-systems that automatically create the backup of data
and restore data if required.

Data abstraction:

Data abstraction is a process of representing the essential features withoutincluding implementation details.
Many database-systems users are not computer trained, developers hide thecomplexity from users through
several levels of abstraction, to simplify users’interactions with the system

There are 3 levels of abstraction

1. Internal level
2. Conceptual level
3. External level

Internal level/ physical level:

The lowest level of abstraction describes how the data areactually stored. The physical level describes complex
low-level datastructures in detail.

At this level various aspects are considered to achieve optimal runtime performance and storage space utilization.
These aspects includes storage space allocation techniques for data indexes, access paths, data compression and
encryption techniques and record placement

Conceptual level/Logical level:

The next-higher level of abstraction describes what data arestored in the database, and what relationships exist
among those data. Thelogical level thus describes the entire database in terms of a small number of
relatively simple structures.

External level/View level:

The highest level of abstraction describes only part of the entire database. The variety of information stored in a
large database. Many users of the database system do not need all this information; instead, they need to access
only a part of the database. The view level of abstraction exists to simplify their interaction with the system.
8

Different levels of database

DBMS users:

Application programmers and system analysis:

Application Programmers are responsible for writing application programs that use the database. These programs
could be written in General Purpose Programming languages such as Visual Basic, Developer, C, FORTRAN, COBOL
etc. to manipulate the database. These application programs operate on the data to perform various operations
such as retaining information, creating new information, deleting or changing existing information.

Database administrator:

DBA is responsible for authorizing access to the database, for coordinating and monitoring its use, and acquiring
software and hardware resources as needed.

Database designers:

Identify the data to be stored in the database and choosing appropriate structures to represent and store
the data. Most of these functions are done before the database is implemented and populated with the data. It is
the responsibility of the database designers to communicate with all prospective users to understand their
requirements and come up with a design that meets these requirements. Database designers interact with all
potential users and develop views of the database that meet the data and processing requirements of these
groups. The final database must support the requirements of all user groups.

End users:

End Users are the people who interact with the database through applications or utilities. The various categories
of end users are:

Data independence:

The ability to modify a scheme definition in one level without affecting a scheme definition in a higher level is
called data independence.
9

Two types of data independence are

1. Logical data independence


2. Physical data independence

Logical data independence:

Logical data independence means the ability to change the internal schema without having to change the
conceptual schema.

Physical data independence:

Physical data independence means the ability to change the conceptual schema without having to change the
external schema

Logical data independence is more difficult to achieve than physical data independence.

Files and file organization:


The Database manages the files and they are accessed using different methods. The need of organization
concerned with the way data must be organized in a logical manner and relationships between the values of the
key fields and records.

Based on this file organization may be of four types


* Serial organization
* Sequential organization
* Direct access or random organization
* Indexed sequential access method (ISAM)
Serial file organization:

Serial file organization is the simplest file organization method. The data are collected in the file in the order in
which they arrive. That means files are in unordered, serial files are primarily used as transaction files in which the
transactions are recorded in the order that they occur.
Sequential file organization:
In a sequential file organization, the records are arranged in a particular order may be in ascending order or
descending order and accessed in the predetermined order of their keys.
In sequential file organization the records are stored in the media called as magnetic tape, punched cards or
magnetic discs. To access records, the computer must read the file in the sequence from the beginning.The first
record is read and processed first and next the second record in the sequence and so on.
Records in a sequential file can be stored in two ways.
Unsorted file: Records are placed one after another as they arrive (no sorting of anykind).
Sorted file: Records are placed in ascending or descending values of theprimary key.
10

Direct/Random Access files organization:


Direct Access:
 Direct access organization allows immediate access to the individual records on the file.
 The records are stored and retrieved using a relative record number which gives the position of the record
in the file.
 In direct access, a record key is used as relative address. So any one can compute record’s address from
the record key and the physical address of the first record.
Random Access:
 Records are stored on the disk using a hashing algorithm.
 The key field is fed through hashing algorithm and a relative address is created. This address gives the
position on the disk where the records are to be stored.
 The desired records can be accessed directly using randomizing procedure in which the records are stored
in such a way that no relationship between the keys of the adjacent records.

Indexed Sequential Access Method (ISAM):


An indexed file contains records ordered by a record key.Each record contains a field that contains the record
key.The record key uniquely identifies the record and determines the sequence in which it is accessed with respect
to other records.Indexing permits access to selected records without searching the entire file.
The record transmission (access) modes allowed for indexed files are sequential, random, or dynamic. When
indexed files are read or written sequentially, the sequence is that of the key values.

Different Types of Architecture:


The design of a database Management system is highly depends on its architecture. It can be centralized or
decentralized or hierarchical.
There are 3 types of architectures
1. One tier architecture
2. Two tier architecture
3. Three tier architecture
One tier architecture:
One-tier architecture keeps all of the elements of an application, including the interface, middleware and
back-end data, in one place.
This is the simplest and cheapest of all the architectures, but also the least secure. Since users have direct access
to the files, they could accidentally move, modify, or even worse, delete the file by accident or on purpose. There
is also usually an issue when multiple users access the same file at the same time: In many cases only one can edit
the file while others only have read-only access.
Another issue is that 1-tier software packages are not very scalable and if the amount to data gets too big, the
software may be very slow or stop working.
So 1-tier architecture is simple and cheap, but usually unsecured and data can easily be lost if you are not careful.
11

Two tier architecture:


This architecture is also called Client-Server architecture because of the two components: The client that
runs the application and the server that handles the database back-end. The client handles the UI and the server
handles the DB. When the client starts, it establishes a connection to the server and communicates as needed with
the server while running the client. The client computer usually can’t see the database directly and can only access
the data by starting the client. This means that the data on the server is much more secure. Now users are unable
to change or delete data unless they have specific user rights to do so.
The client-server solution also allows multiple users to access the database at the same time. An interface called
ODBC (Open Data Base Connectivity) provides an API that allows client sideprogram to call the DBMS.
Three tier architecture:
In this architecture all three tiers are separated onto different computers. The UI runs on the client (what
the user is working with). The application layer is running on a separate server, called the business logic tier or
middle tier, or service tier. Finally the database is running on its own database server.
The three tiers in three-tier architecture are:
Presentation Tier: Occupies the top level and displays information related to services available on a website. This
tier communicates with other tiers by sending results to the browser and other tiers in the network.
Application Tier: Also called the middle tier, logic tier, business logic or logic tier, this tier is pulled from the
presentation tier. It controls application functionality by performing detailed processing.
Data Tier: Houses database servers where information is stored and retrieved. Data in this tier is kept independent
of application servers or business logic.
Database model:
A database model is a type of data model that determines the logical structure of a database and fundamentally
determines in which manner data can be stored, organized, and manipulated.
The data models are classified as:
1. Hierarchical Model
2. Network Model
3. Relational Model
Hierarchical Model:
A hierarchical database model is a data model in which the data is organized into a tree-like structure. The
data is stored as records which are connected to one another through links. There is a hierarchy of parent and
child data segments.
The hierarchical database model tells that each child record has only one parent, whereas each parent
record can have one or more child records.The top of the tree structure contains of a single node that does not
have any parent and is called the root node
In order to retrieve data from a hierarchical database the whole tree needs to be traversed starting from the root
node. This model is recognized as the first database model created by IBM in the 1960s
12

Advantages of Hierarchical model


1. Simplicity:
Since the database is based on the hierarchical structure, the relationship between the various layers is
logically simple.
2.Data Security:
Hierarchical model was the first database model that offered the data security that is provided by the
DBMS.
3. Data Integrity:
Since it is based on the parent child relationship, there is always a link between the parent segment and
the child segment under it.
4. Efficiency:
It is very efficient because when the database contains a large number of 1:N relationship and when the
user require large number of transaction.
Disadvantages of Hierarchical model:
1. Implementation complexity:
Although it is simple and easy to design, it is quite complex to implement.
2. Database Management Problem:
If you make any changes in the database structure, then you need to make changes in the entire application
program that access the database.
3. Lack of Structural Independence:
There is lack of structural independence because when wechange the structure then it becomes compulsory to
change the application too.
4. Operational Anomalies:
Hierarchical model suffers from the insert, delete and update anomalies, also retrieval operation is difficult.
Network Model:
The network model is used to represent complex data relationships more effectively and improve database
performance and impose a database standard.
13

The first specification of network data model was represented by Conference on Data Systems Languages
(CODASYL) in 1969.
The network model is very similar to the hierarchical model. In fact, the hierarchical model is a subset of the
network model. However, instead of using a single-parent tree hierarchy, the network model uses set theory to
provide a tree-like hierarchy with the exception that child tables were allowed to have more than one parent. This
allowed the network model to support many-to-many relationships

Advantages of Hierarchical model

1. Conceptual simplicity

2. Handles more relationship types

3. Data access flexibility

4. Promotes database integrity

5. Data independence

Disadvantages of Hierarchical model:

1. System complexity

2. Lack of structural independence

Relational model:

Relational model was developed by E.F. CODD in 1970. He is also called as father of RDBMS. In relational model,
unlike the hierarchical and network model there are no physical links. In relational model all data is maintained in
form tables consisting of rows and columns. Here each row represents an entity and a column represents an
attribute. The relationship between two tables is implemented through a common attribute in the tables and not
by physical link.
14

Advantages

1. The main advantage of this model is its ability to represent data in a simplifiedformat.

2. The process of manipulating record is simplified with the use of certain keyattributes used to retrieve data.

3. Representation of different types of relationship is possible with this model.

Codd’s Rule:

Relational model was developed by E.F. CODD in 1970. He is also called as father of RDBMS. Based on relational
model, relational database was created. Codd proposed 13 rules popularly known as Codd’s 12 rules to test DBMS
concepts. Codd’s rules actually define what quality a DBMS requires in order to become a relational database
management system (RDMS).

Rule zero:

This rule states that for a system to qualify as an RDBMS, it must be able to manage database entirely through the
relational capabilities.

Rule 1: Information rule:

All information (including metadata) is to be represented as stored data in cells of tables. The rows and columns
have to be strictly unordered.

Rule 2: Guaranteed Access:

Each unique piece of data (atomic value) should be accessibleby:Table Name + primary key(Row) +
Attribute(column).

NOTE:Ability to directly access via POINTER is a violation of this rule.

Rule 3: Systematic treatment of NULL:

Null has several meanings; it can mean missing data, not applicable or no value. It should be handled consistently.
Primary key must not be null. Expression on NULL must give null.

Rule 4: Active Online Catalog:

Database dictionary (catalog) must have description of Database. Catalog to be governed by same rule as rest of
the database. The same query language to be used on catalog as on application database.

Rule 5: Powerful language:


15

One well defined language must be there to provide all manners of access to data. Example: SQL. If a file
supporting table can be accessed by any manner except SQL interface, then its a violation to this rule.

Rule 6: View Updation rule:

All views that are theoretically updatable should be updatable by the system.

Rule 7: Relational Level Operation:

There must be Insert, Delete, and Update operations at each level of relations. Set operation like Union,
Intersection and minus should also be supported.

Rule 8: Physical Data Independence:

The physical storage of data should not matter to the system. If say, some file supporting table were renamed or
moved from one disk to another, it should not affect the application.

Rule 9: Logical Data Independence

If there is change in the logical structure (table structures) of the database the user view of data should not
change. Say, if a table is split into two tables, a new view should give result as the join of the two tables. This rule
is most difficult to satisfy.

Rule 10: Integrity Independence

The database should be able to con-force its own integrity rather than using other programs. Key and Check
constraints, trigger etc. should be stored in Data Dictionary. This also makesRDBMS independent of front-end.

Rule 11: Distribution Independence

A database should work properly regardless of its distribution across a network. This lays foundation of distributed
database.

Rule 12: Non-subversion rule

If low level access is allowed to a system it should not be able to subvert or bypass integrity rule to change data.
This can be achieved by some sort of looking or encryption

Normalization of Database:

Database Normalization is a technique of organizing the data in the database. Normalization is a systematic
approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion,
Update and Deletion Anomalies.

Normalization rule are divided into following normal form.

1. First Normal Form

2. Second Normal Form

3. Third Normal Form

4. BCNF

First Normal Form (1NF):


16

As per First Normal Form, no two Rows of data must contain repeating group of information i.e. each set of
column must have a unique value, such that multiple columns cannot be used to fetch the same row. Each table
should be organized into rows, and each row should have a primary key that distinguishes it as unique.

Std_id Std_name Std_address Subject_opted

401 Ramesh karnataka Bio

402 Prajwal Mumbai Maths

403 Dinesh Pune Comp.sci

Second Normal Form (2NF):

Before 2NF table should meet all the needs of 1NF. As per the Second Normal Form there must not be any partial
dependency of any column on primary key. It means that for a table that has concatenated primary key, each
column in the table that is not part of the primary key must depend upon the entire concatenated key for its
existence. If any column depends only on one part of the concatenated key, then the table fails Second normal
form.

Consider the following example:

This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase Location]. In
this case, [Purchase Location] only depends on [Store ID], which is only part of the primary key. Therefore, this
table does not satisfy second normal form.

To bring this table to second normal form, we break the table into two tables, and now we have the following:
17

Third Normal Form (3NF):

Third Normal form applies that every non-prime attribute of table must be dependent on primary key. The
transitive functional dependency should be removed from the table. The table must be in Second Normal form.

Student_id Student_name DOB Street City State Zip

In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip
and other fields is called transitive dependency. If there is transitive dependencies then remove that particular
field and store it in a separate file. Hence to apply 3NF, we need to move the street, city and state to new table,
with Zip as primary key.

Student table

Student_id Student_name DOB Zip

Address table

Zip Street City State

Boyce and Codd Normal Form (BCNF):

A table is in Boyce-Codd normal form (BCNF) if and only if it is in 3NF and every determinant is a candidate key.

The process of converting the table into BCNF is as follows:

1. Remove the non trival functional dependency.


2. Make separate table for the determinants.

BCNF of below table is as follows:


18

Entity Relationship Model (ER model):

ER-Diagram is a visual representation of data that describes how data is related to each other.The entity
relationship data model is based on a perception of a real world consists of basic objects called entities and
relation among these objects.

The E-R diagram has three main components.

1. Entity:

An entity is an object in the real world that is distinguishable from other objects. An Entity can be any object,
place, person or class. In E-R Diagram, an entity is represented using rectangles.

Weak Entity:

Weak entity is an entity that depends on another entity. Weak entity doesn't have key attribute of their
own. Double rectangle represents weak entity.

2. Attribute:

An Attribute describes a property or characteristic of an entity. The attributes are useful in describing the
properties of each entity in the entity set. An attribute is represented using ellipse.

Key Attribute:

Key attribute represents the main characteristic of an Entity. It is used to represent Primary key. Ellipse with
underlying lines represents Key Attribute.

Composite Attribute:

An attribute can also have their own attributes. These attributes are known as Composite attribute.

3) Relationship:

A Relationship describes relations between entities. Relationship is represented using diamonds.

There are three types of relationship that exist between Entities.


19

1. Binary Relationship

2. Recursive Relationship

3. Ternary Relationship

Binary Relationship:

Binary Relationship means relation between two Entities.

Recursive Relationship:

When an Entity is related with itself it is known as Recursive Relationship.

Ternary Relationship:

Relationship of degree three is called Ternary relationship.

SYMBOL MEANING

Entity

Weak entity

Relation

Attribute

Key attribute

Composite attribute

Links

Cardinality:

Cardinality is a very important concept in database design. Cardinalities are used when you are creating an E/R
diagram, and show the relationships between entities/ tables. Cardinality specifies how many instances of one
entity relate to one instance of another entity.
20

Cardinality should be one of the following

1. One to one
2. One to many
3. Many to one
4. Many to many

One to one:

An entity from one entity set is associated with at most one entity in another entity set and vice versa.

Customer Cust Account


a/c

One to many:

An entity from one entity set is associated with one or more instances of another entity.

Teacher Class Student

Many to one:

Many instances of an entity from one entity setcan be associated with a single entity from another entity set.

Student Class Teacher

Many to many:

Many instances of an entity from one entity set are associated with many instances from another entity set.

Customer Cust Account


A/c

Keys:

The key is defined as the column or attribute of the database table. They ensure each record within a table can be
uniquely identified by one or a combination of fields within the table.

There are various types of keys


21

1. Super key

2. Candidate key

3. Primary key

4. Foreign key

5. Composite key

6. Alternate key

Super key:

A Super key is any combination of fields within a table that uniquely identifies each record within that table.

Candidate key:

A candidate is a subset of a super key. A candidate key is a single field or the least combination of fields that
uniquely identifies each record in the table. The least combination of fields distinguishes a candidate key from a
super key. Every table must have at least one candidate key but at the same time can have several.

In order to be eligible for a candidate key it must pass certain criteria.

 It must contain unique values


 It must not contain null values
 It contains the minimum number of fields to ensure uniqueness
 It must uniquely identify each record in the table

Once your candidate keys have been identified you can now select one to be your primary key

Primary key:

A primary key is the candidate key which is selected as the principal unique identifier. Every relation must contain
a primary key. The primary key is usually the key selected to identify a row when the database is implemented.

As with any candidate key the primary key must contain unique values, must never be null and uniquely identify
each record in the table.
22

Foreign key:

A foreign key is generally a primary key from one table that appears as a field in another where the first table has a
relationship to the second.

Composite key:

A composite key consists of more than one field to uniquely identify a record.

Alternate key:
23

A table may have one or more choices for the primary key. Collectively these are known as candidate keys. One is
selected as the primary key. Those not selected are known as secondary keys or alternative keys.

Relational algebra:

Relational algebra is a procedural query language. It consists of set of operations that are used to
manipulate the relational model. These set of operations takes one or more relation as input and produce a new
relation as their result.

A sequence of relational algebra operations forms a relational algebra expression.

The relational algebra is very important for several reasons

 It provides foundations for relational model operations.


 Used to implementing and optimizing queries in RDBMS.
 The core operations and functions of any relational system are based on relational algebra operations.

Relational algebra operations are

1. Select
2. Project
3. Cartesian product
4. Rename
5. Union
6. Set difference
7. Set intersection
8. Join
9. Division
10. assignment

Select operation: (σ)

The select operation enables the user to specify basic retrieval request the result of retrieval is a new relation. This
may have been formed from one or more relations.

The select operation is used to select a subset the tuples from a relation that satisfy a selection condition.

The selection operation is denoted by

σ<selectioncondition>(R)
Where, σis used to denote select operation
R is relation (table)

Example:

Select a subset of employees from EMPLOYEE whosesalary is greater than 2500


24

σsalary>2500(EMPLOYEE)
Project operation:(π)

The project operation selects certainattributes (columns) from a relation table and discards the other columns
(duplicate rows are eliminated).

The general form of project operation is

π<attribute list>(R)
Where, πis used to denote select operation

Attribute list is list of attributes from the relation R

R is relation (table)

Example:

List all the employees name and salary

πEmpname, salary(EMPLOYEE)

Cartesian product: (×)

This is a binary operation, which is also known as cross product or cross join. This operation is used to combine
tuples from two relations in a combinational fashion. In general the result of R (A1,A2……An) X S(B1,B2……Bm) is a
relation Q with n+m attributes.

The Cartesian product operation is denoted by “X”

Rename operation: ( ρ)

The rename operation can rename either the relation name or the attribute names, or both as a unary operator.

The rename operation is denoted by “ρ”

Union operation:(U)

The result of this operation is denoted by RUS is the relation that includes all tuples that are either in R or in S or in
both R and S. duplicate tuples are eliminated.

The union operation is denoted by “U”

Set difference:()

The result of this operation is denoted by R-S, is a relation that includes all tuples that are in R but not in S.
25

The setdifference is denoted by “-”

Set intersection:(∩)

The result of this operation is denoted by R∩S, is a relation that includes all tuples that are in both R and S.

The set insertion operation is denoted by “∩”

Join:(⋈)

Join is used to combine related tuples from two relations.

Natural join:

The natural join is a binary operation that allows us to combine certain selections and a Cartesian product into one
operation.

The natural join operation forms a Cartesian product of its two arguments, performs a selection forcing equality
on those attributes that appear in both relation schemes and finally removes duplicate columns.

The join operation is denoted by “⋈”

Outer joins:
An outer join does not require each record in the two joined tables to have a matching record. The joined
table retains each record—even if no other matching record exists.

Outer joins subdivide further into

1. Left outer joins


2. Right outer joins
3. Full outer joins

Left outer join:

The result contains all the rows from the first source and the correspondent values for second source (or empty
values for non-matching keys)

Right outer join:

The result contains all the rows from the second source, and the corresponding values for the first source ( or
empty values for non-matching keys)

Full outer join:

The result contains all the rows from both sources (with empty values for non-matching keys)
26

Left and right outer join

Full outer join

Data warehouse:

Data warehouse is Subject Oriented, Integrated, Time-Variant and Nonvolatile collection of data that support
management's decision making process.

The key features of Data Warehouse such as Subject Oriented, Integrated, Nonvolatile and Time-Variant are are
discussed below:

Subject Oriented - The Data Warehouse is Subject Oriented because it provides us the information around a
subject rather the organization's ongoing operations. These subjects can be product, customers, suppliers, sales,
revenue etc. The data warehouse does not focus on the ongoing operations Rather it focuses on modelling and
analysis of data for decision making.

Integrated - Data Warehouse is constructed by integration of data from heterogeneous sources such as relational
databases, flat files etc. This integration enhances the effective analysis of data.

Time-Variant - The Data in Data Warehouse is identified with a particular time period. The data in data warehouse
provide information from historical point of view.

Non Volatile - Nonvolatile means that the previous data is not removed when new data is added to it. The data
warehouse is kept separate from the operational database therefore frequent changes in operational database
are not reflected in data warehouse.
27

Advantages

The data warehouse addresses these factors and provides many advantages to the end-users

1. Improved end-user access to a wide variety data


2. Increased data consistency
3. Additional documentation of the data
4. Potentially lower computing costs and increased productivity
5. Providing a place to combine related data from separate sources
6. Creation of a computing infrastructure that can support changes in computer systems and business
structures

Data mining:

Data Mining is defined as extracting the information from the huge set of data. In other words we can say that
data mining is mining the knowledge from data.

The different stages of data mining is a logical process for searching large amount information for finding
important data.
Stage 1: Exploration:
One will want to explore and prepare data. The goal of the exploration stage is to find important variables
and determine their nature.
Stage 2: pattern identification:
Searching for patterns and choosing the one which allows making best prediction, is the primary action in
this stage.
Stage 3: Deployment stage.
Until consistent pattern is found in stage 2, which is highly predictive, this stage cannot be reached. The
pattern found in stage 2, can be applied for the purpose to see whether the desired Outcome is achieved or not.

You might also like