0% found this document useful (0 votes)
4 views30 pages

ET Unit 01

The document discusses the fundamentals of database systems, comparing file systems and Database Management Systems (DBMS), highlighting the advantages of DBMS such as reduced data redundancy, better security, and efficient data management. It covers key concepts including data models, transaction management, and the structure of a DBMS, along with primary terminologies used in database design. Additionally, it explains the Entity-Relationship (ER) model and its components for visualizing database structures and relationships.

Uploaded by

Aasha Ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views30 pages

ET Unit 01

The document discusses the fundamentals of database systems, comparing file systems and Database Management Systems (DBMS), highlighting the advantages of DBMS such as reduced data redundancy, better security, and efficient data management. It covers key concepts including data models, transaction management, and the structure of a DBMS, along with primary terminologies used in database design. Additionally, it explains the Entity-Relationship (ER) model and its components for visualizing database structures and relationships.

Uploaded by

Aasha Ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

EMERGING TECHNOLOGIES IN DATA PROCESSING

UNIT-1
DATABASE SYSTEM FUNDAMENTALS
File system versus DBMS
The file system is basically a way of arranging the files in a storage
medium like a hard disk. The file system organizes the files and helps in the
retrieval of files when they are required. File systems consist of different files which
are grouped into directories. The directories further contain other folders and files.
The file system performs basic operations like management, file naming, giving
access rules, etc.

DBMS(Database Management System)


Database Management System is basically software that manages the
collection of related data. It is used for storing data and retrieving the data
effectively when it is needed. It also provides proper security measures for
protecting the data from unauthorized access. In Database Management System
the data can be fetched by SQL queries and relational algebra. It also provides
mechanisms for data recovery and data backup. Example:Oracle, MySQL, MS
SQL server.

Difference between File System and DBMS:


Basics File System DBMS
Structure The file system is a way DBMS is software
of arranging the files in a for managing the database.
storage medium within a
computer.

1
EMERGING TECHNOLOGIES IN DATA PROCESSING

Data Redundant data can be In DBMS there is no redundant


Redundancy present in a file system. data.

Backup and It doesn’t provide Inbuilt It provides in house tools for


Recovery mechanism for backup backup and recovery of data
and recovery of data if it even if it is lost.
is lost.
Query processing There is no efficient query Efficient query processing is
processing in the file there in DBMS.
system.
Consistency There is less data There is more data consistency
consistency in the file because of the process of
system. normalization.
Basics File System DBMS

Complexity It is less complex as It has more complexity in


compared to DBMS. handling as compared to the file
system.
Security File systems provide less DBMS has more security
Constraints security in comparison to mechanisms as compared to file
DBMS. systems.

Advantages of Database Management System:


Database Management System (DBMS) is a collection of interrelated
data and a set of software tools/programs that access, process, and manipulate data.
It allows access, retrieval, and use of that data by considering appropriate security
measures. The Database Management system (DBMS) is really useful for better
data integration and security.
Better Data Transferring: Database management creates a place where users have
an advantage of more and better-managed data. Thus making it possible for end-
users to have a quick look and to respond fast to any changes made in their
environment.
Better Data Security: The more accessible and usable the database, the more it is
prone to security issues. As the number of users increases, the data transferring or
data sharing rate also increases thus increasing the risk of data security.
Data abstraction: The major purpose of a database system is to provide users with
an abstract view of the data. Since many complex algorithms are used by the
developers to increase the efficiency of databases that are being hidden by the users

2
EMERGING TECHNOLOGIES IN DATA PROCESSING

through various data abstraction levels to allow users to easily interact with the
system.
Reduction in data Redundancy: When working with a structured database,
DBMS provides the feature to prevent the input of duplicate items in the database.
for e.g. – If there are two same students in different rows, then one of the duplicate
data will be deleted.
Application development: A DBMS provides a foundation for developing
applications that require access to large amounts of data, reducing development
time and costs.
Data sharing: A DBMS provides a platform for sharing data across multiple
applications and users, which can increase productivity and collaboration.
Data organization: A DBMS provides a systematic approach to organizing data in
a structured way, which makes it easier to retrieve and manage data efficiently.
Increased end-user productivity: The data which is available with the help of a
combination of tools that transform data into useful information, helps end-users to
make quick, informative, and better decisions that can make a difference between
success and failure in the global economy.
Simple: Database management system (DBMS) gives a simple and clear logical
view of data. Many operations like insertion, deletion, or creation of files or data
are easy to implement.
Describing and Storing Data in a DBMS
 A data model is a collection of high-level data description constructs that hide
many low-level storage details
 A semantic data model is a more abstract, high-level data model that makes it
easier for a user to come up with a good initial description of the data in an
enterprise.
 A database design in terms of a semantic model serves as a useful starting point
and is subsequently translated into a database design in terms of the data model
the DBMS actually supports.
 A widely used semantic data model called the entity-relationship (ER) model
allows us to pictorially denote entities and the relationships among them.

The Relational Model


 The central data description construct in this model is relation, which can be
thought of as a set of records.
 A description of data in terms of a data model is called a schema.
 The schema for a relation specifies its name, the name of each field
or attribute or column.

3
EMERGING TECHNOLOGIES IN DATA PROCESSING

Example: student information in a university database my be stored in a


relation with the following schema (with 5 fields):
Students(sid: string, name: string, login: string, age: integer, gpa: real) An
example instance of the Students relation

Sid name Login age gpa


53666 Jones jones@cs 18 3.4
53588 Smith smith@ee 18 3.2

Each row in the Students relation is a record that describes a student. Every row
follows the schema of the Student relation and schema can therefore be
regarded as a template for describing a student.
We can make the description of a collection of students more precise by
specifying integrity constraints, which are conditions that the records in a
relation must staisfy.
Other notable models: hierarchial model, network model, object-oriented model,
and the object-relational model.

Levels of Abstraction in a DBMS

 A data definition language (DDL) is used to define the external and


conceptual schemas.
 Information about conceptual, external, and physical schemas is stored in the
system catalogs.
 Any given database has exactly one conceptual schema and one physical
schema because it has just one set of stored relations, but it may have several
external schemas, each tailored to a particular group of users.
1. Conceptual Schema
2. Physical Schema
3. External Schema
4
EMERGING TECHNOLOGIES IN DATA PROCESSING

Transaction Management
Transactions are a set of operations used to perform a logical set of work.
It is the bundle of all the instructions of a logical operation. A transaction usually
means that the data in the database has changed. One of the major uses of DBMS
is to protect the user’s data from system failures. It is done by ensuring that all the
data is restored to a consistent state when the computer is restarted after a crash.
The transaction is any one execution of the user program in a DBMS. One of the
important properties of the transaction is that it contains a finite number of steps.
Executing the same program multiple times will generate multiple transactions.
Structure of Database Management System
Database Management System (DBMS) is software that allows access to
data stored in a database and provides an easy and effective method of –
• Defining the information.
• Storing the information.
• Manipulating the information.
• Protecting the information from system crashes or data theft.
• Differentiating access permissions for different users.

1. Query Processor: It interprets the requests (queries) received from


end user via an application program into instructions. It also executes the user
request which is received from the DML compiler. Query Processor contains the
following components –

5
EMERGING TECHNOLOGIES IN DATA PROCESSING

DML Compiler: It processes the DML statements into low level instruction
(machine language), so that they can be executed.
DDL Interpreter: It processes the DDL statements into a set of table containing
meta data (data about data).
Embedded DML Pre-compiler: It processes DML statements embedded
in an application program into procedural calls.
Query Optimizer: It executes the instruction generated by DML Compiler.
2.Storage Manager: Storage Manager is a program that provides an interface
between the data stored in the database and the queries received. It is also known
as Database Control System.
 Authorization Manager: It ensures role-based access control, i.e,. checks
whether the particular person is privileged to perform the requested
operation or not.
 Integrity Manager: It checks the integrity constraints when the database is
modified.
 Transaction Manager: It controls concurrent access by performing the
operations in a scheduled way that it receives the transaction. Thus, it
ensures that the database remains in the consistent state before and after the
execution of a transaction.
 File Manager: It manages the file space and the data structure used to
represent information in the database.
 Buffer Manager: It is responsible for cache memory and the transfer of data
between the secondary storage and main memory.

3.Disk Storage: It contains the following components –


 Data Files: It stores the data.
 Data Dictionary: It contains the information about the structure of any
database object. It is the repository of information that governs the metadata.
 Indices: It provides faster retrieval of data item.

Database Design in DBMS


 Database Design can be defined as a set of procedures or collection of
tasks involving various steps taken to implement a database. Following
are some critical points to keep in mind to achieve a good database
design:
 Data consistency and integrity must be maintained.
 Low Redundancy
 Faster searching through indices

6
EMERGING TECHNOLOGIES IN DATA PROCESSING

 Security measures should be taken by enforcing various integrity


constraints.
 Data should be stored in fragmented bits of information in the most
atomic format possible.

Primary Terminologies Used in Database Design


 Redundancy: Redundancy refers to the duplicity of the data. There can be
specific use cases when we need or don’t need redundancy in our Database.
 Schema: Schema is a logical container that defines the structure & manages
the organization of the data stored in it. It consists of rows and columns having
data types for each column.
 Records/Tuples: A Record or a tuple is the same thing, basically its where our
data is stored inside a table
 Indexing: Indexing is a data structure technique to promote efficient retrieval
of the data stored in our database.
 Data Integrity & Consistency: Data integrity refers to the quality of the
information stored in our database and consistency refers to the correctness of
the data stored.
 Data Models: Data models provide us with visual modeling techniques to
visualize the data & the relationship that exists among those data. Ex: model,
Network Model, Object Oriented Model, Hierarchical model, etc.
 Functional Dependency: Functional Dependency is a relationship between
two attributes of the table that represents that the value of one attribute can be
determined by another. Ex: {A -> B}, A & B are two attributes and attribute A
can uniquely determine the value of B.
 Transaction: Transaction is a single logical unit of work. It signifies that some
changes are made in the database.
 Schedule: Schedule defines the sequence of transactions in which they’re
executed by one or multiple users.
Entity Relationship Diagram –
ER Diagram in DBMS An Entity–relationship model (ER model)
describes the structure of a database with the help of a diagram, which is known
as Entity Relationship Diagram (ER Diagram). An ER model is a design or
blueprint of a database that can later be implemented as a database. The main
components of E-R model are: entity set and relationship set
A simple ER Diagram:
In the following diagram we have two entities Student and College and their
relationship. The relationship between Student and College is many to one as a
college can have many students however a student cannot study in multiple

7
EMERGING TECHNOLOGIES IN DATA PROCESSING

colleges at the same time. Student entity has attributes such as Stu_Id, Stu_Name
& Stu_Addr and College entity has attributes such as Col_ID & Col_Name.

Here are the geometric shapes and their meaning in an E-R Diagram. We
will discuss these terms in detail in the next section (Components of a ER
Diagram) of this guide so don’t worry too much about these terms now, just go
through them once.
 Rectangle: Represents Entity sets.
 Ellipses: Attributes
 Diamonds: Relationship Set
 Lines: They link attributes to Entity Sets and Entity sets to
Relationship
 Set
 Double Ellipses: Multivalued Attributes
 Dashed Ellipses: Derived Attributes
 Double Rectangles: Weak Entity Sets
 Double Lines: Total participation of an entity in a relationship set
 Components of a ER Diagram

As shown in the above diagram, an ER diagram has three main


components:
1. Entity
2. Attribute

8
EMERGING TECHNOLOGIES IN DATA PROCESSING

3. Relationship

1. Entity
 An entity is an object or component of data. An entity is
represented as rectangle in an ER diagram.
 For example: In the following ER diagram we have two entities
Student and College and these
 two entities have many to one relationship as many students study
in a single college. We will read more about relationships later, for
now focus on entities.

Weak Entity:
 An entity that cannot be uniquely identified by its own attributes and
relies on the relationship with other entity is called weak entity. The
weak entity is represented by a double rectangle.
 For example – a bank account cannot be uniquely identified
without knowing the bank to which the account belongs, so
bank account is a weak entity.

Weak Entities:
 A weak entity is a type of entity which doesn't have its key attribute. It
can be identified uniquely by considering the primary key of another entity.
For that, weak entity sets need to have participation.

9
EMERGING TECHNOLOGIES IN DATA PROCESSING

2.Attribute
 An attribute describes the property of an entity. An attribute is
represented as Oval in an ER diagram. There are four types of
attributes:

1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute

2. Composite attribute: An attribute that is a combination of other attributes is


known as composite attribute. For example, In student entity, the student address
is a composite attribute as an address is composed of other attributes such as pin
code, state, country.

3. Multivalued attribute:An attribute that can hold multiple values is known as


multivalued attribute. It is represented with double ovals in an ER Diagram.
For example – A person can have more than one phone numbers so the phone
number attribute is multivalued.
4. Derived attribute:A derived attribute is one whose value is dynamic and
derived from another attribute. It is represented by dashed oval in an ER
Diagram. For example – Person age is a derived attribute as it changes over time
and can be derived from another attribute (Date of birth).

E-R diagram with multivalued and derived attributes:

10
EMERGING TECHNOLOGIES IN DATA PROCESSING

3. Relationship:
Cardinality: Defines the numerical attributes of the
relationship between two entities or entity sets.
A relationship is represented by diamond shape in ER diagram, it
shows the relationship among entities. There are four types of cardinal
relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many

1. One to One Relationship


When a single instance of an entity is associated with a single instance of
another entity then it is called one to one relationship. For example, a person has
only one passport and a passport is given to one person.

2. One to Many Relationship


When a single instance of an entity is associated with more than one
instances of another entity then it is called one to many relationship. For
example – a customer can place many orders but a order cannot be placed by
many customers.

3. Many to One Relationship


When more than one instances of an entity is associated with a single
instance of another entity then it is called many to one relationship. For

11
EMERGING TECHNOLOGIES IN DATA PROCESSING

example – many students can study in a single college but a student cannot
study in many colleges at the same time.

4. Many to Many Relationship


When more than one instances of an entity is associated with more than
one instances of anotherentity then it is called many to many relationship. For
example, a can be assigned to many projects and a project can be assigned to
many students.

RELATIONSHIP SETS
Types of Relationship Sets
On the basis of degree of a relationship set, a relationship set can be
classified into the following types

o Unary relationship set


o Binary relationship set
o Ternary relationship set
Unary Relationship Set
Unary relationship set is a relationship set where only one entity set
participates in a relationship set.

Binary Relationship Set


Binary relationship set is a relationship set where two entity sets participate in
a relationship set.

Ternary Relationship Set


Ternary relationship set is a relationship set where three entity sets
participate in a relationship set.

12
EMERGING TECHNOLOGIES IN DATA PROCESSING

Additional Features of the ER Model in DBMS

Three new concepts were added to the existing ER Model, they were:

• Generalization
• Specialization
• Aggregration

Superclass: An entity type that represents a general concept at a high level, is


called superclass.
Subclass: An entity type that represents a specific concept at lower levels, is called
subclass.The subclass is said to inherit from superclass. When a subclass inherits
from one or more superclasses, it inherits all their attributes. In addition to the
inherited attributes, a subclass can also define its own specific attributes.
Generalization
Generalization is a process of extracting common properties from a set of
entities and creating a generalized entity from it. It is a bottom-up approach, and it
helps to reduce the size and complexity of the schema.
Example: Let us take two low-level entities as Car and Bus, and these two will
have many common attributes and some specific attributes. And We will generalize
and link the common attributes to the newly formed highlevel entity named Vehicle.

Specialization
Specialization is opposite to Generalization. In this, entity is divided into
subentities bases on their charactertics(distingvishing features). It breaks an entity
into multiple entities from higher level to lower level. It is a top down approach.

13
EMERGING TECHNOLOGIES IN DATA PROCESSING

Aggregration
Aggregation refers to the process by which entities are combined to form a
single meaningful entity. The specific entities are combined because they do not
make sense on their own. To establish a single entity, aggregation creates a
relationship that combines these entities. The resulting entity makes sense because
it enables the system to function well.
KEY CONSTRAINTS
• Constraints or nothing but the rules that are to be followed while
entering data into columns of the database table
• Constraints ensure that data entered by the user into columns must be

within the criteria specified by the condition


• For example, if you want to maintain only unique IDs in the employee
table or if you want to enter only age under 18 in the student table etc
• We have 5 types of key constraints in DBMS
o UNIQUE : provides a unique/distinct values to specified
columns.
o DEFAULT: provides a default value to a column if none is
specified.
o CHECK :checks for the predefined conditions before
inserting the data inside the table.
o PRIMARY KEY: it uniquely identifies a row in a table.

14
EMERGING TECHNOLOGIES IN DATA PROCESSING

o NOT NULL: ensures that the specified column doesn’t


contain a NULL value.

FOREIGN KEY Not Null


• Null represents a record where data may be missing data or data for
that record may be optional
• Once not null is applied to a particular column, you cannot enter
null values to that column and restricted to maintain only some
proper value other than null
• A not-null constraint cannot be applied at table level o : ensures
referential integrity of the relationship

Example
CREATE TABLE Orders (
OrderID int NOT NULL,
OrderNumber int NOT NULL,
PersonID int,
PRIMARY KEY (OrderID),
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID));
Unique
• Sometimes we need to maintain only unique data in the column of a

database table, this is possible by using a unique constraint


•Unique constraint ensures that all values in a column are unique
Example
CREATE TABLE Persons (

15
EMERGING TECHNOLOGIES IN DATA PROCESSING

ID int UNIQUE,
LastName varchar(255)
NOT NULL,
FirstName varchar(255), Age int,);

DEFAULT
• Default clause in SQL is used to add default data to the columns
• When a column is specified as default with some value then all the
rows will use the same value i.e each and every time while entering
the data we need not enter that value
• But default column value can be customized i.e it can be overridden
when inserting a data for that row based on the requirement.

Primary Key

A primary key is a constraint in a table that uniquely identifies each row record
in a database table by enabling one or more the columns in the table as the primary
key.

Creating a primary key

16
EMERGING TECHNOLOGIES IN DATA PROCESSING

A particular column is made as a primary key column by using the primary


key keyword followed with the column name

CREATE TABLE EMP (

ID INT

NAME VARCHAR (20)

AGEINT

COURSE VARCHAR(10)

PRIMARY KEY (ID) );

• Here we have used the primary key on ID column then ID column


must contain unique values i.e one ID cannot be used for another
student.
• If you try to enter duplicate value while inserting in the row you
are displayed with an error
• Hence primary key will restrict you to maintain unique values and
not null values in that particular column Foreign Key
• The foreign key a constraint is a column or list of columns that points
to the primary key column of another table
• The main purpose of the foreign key is only those values are allowed
in the present table that will match the primary key column of another
table.
Example to create a foreign key
CREATE TABLE CUSTOMERS1( ID
INT , NAME VARCHAR
COURSE VARCHAR(10) , PRIMARY KEY (20) ,
(ID));

17
EMERGING TECHNOLOGIES IN DATA PROCESSING

Indexing in Database
Indexing improves database performance by minimizing the number of disc
visits required to fulfill a query. It is a data structure technique used to locate and
quickly access data in databases. Several database fields are used to generate
indexes. The main key or candidate key of the table is duplicated in the first
column, which is the Search key. To speed up data retrieval, the values are also
kept in sorted order. It should be highlighted that sorting the data is not required.
The second column is the Data Reference or Pointer which contains a set of
pointers holding the address of the disk block where that particular key value can
be found.

Structure of Index in Database

Attributes of Indexing
• Access Types: This refers to the type of access such as value-based
search, range access, etc.
• Access Time: It refers to the time needed to find a particular data element
or set of elements.
• Insertion Time: It refers to the time taken to find the appropriate space
and insert new data.
• Deletion Time: Time taken to find an item and delete it as well as update
the index structure.
18
EMERGING TECHNOLOGIES IN DATA PROCESSING

• Space Overhead: It refers to the additional space required by the index.

Sequential File Organization or Ordered Index File


In this, the indices are based on a sorted ordering of the values. These are
generally fast and a more traditional type of storing mechanism. These
Ordered or Sequential file organizations might store the data in a dense or
sparse format.

Dense Index
• For every search key value in the data file, there is an index record.
• This record contains the search key and also a reference to the first data
record with that search key value.

Dense Index

Sparse Index

19
EMERGING TECHNOLOGIES IN DATA PROCESSING

The index record appears only for a few items in the data file.
Each item points to a block as shown.
To locate a record, we find the index record with the largest
search key value less than or equal to the search key value we are looking
for.
We start at that record pointed to by the index record, and
proceed along with the pointers in the file (that is, sequentially) until we
find the desired record.
Number of Accesses required=log₂(n)+1, (here n=number of
blocks acquired by index file)

Hash File Organization


Indices are based on the values being distributed uniformly across a range
of buckets. The buckets to which a value is assigned are determined by a function
called a hash function. There are primarily three methods of indexing:
• Clustered Indexing: When more than two records are stored in the
same file this type of storing is known as cluster indexing. By using cluster
indexing we can reduce the cost of searching reason being multiple records
related to the same thing are stored in one place and it also gives the frequent
joining of more than two tables (records).
• Primary Indexing: This is a type of Clustered Indexing wherein the
data is sorted according to the search key and the primary key of the database table
is used to create the index. It is a default format of indexing where it induces
sequential file organization. As primary keys are unique and are stored in a sorted
manner, the performance of the searching operation is quite efficient.
• Non-clustered or Secondary Indexing: A non-clustered index just tells
us where the data lies, i.e. it gives us a list of virtual pointers or references to the
location where the data is actually stored. Data is not physically stored in the order
of the index. Instead, data is present in leaf nodes. For eg. the contents page of a
book. Each entry gives us the page number or location of the information stored.

20
EMERGING TECHNOLOGIES IN DATA PROCESSING

File Organization in DBMS


A database consists of a huge amount of data. The data is grouped within a
table in RDBMS, and each table has related records. A user can see that the data is
stored in the form of tables, but in actuality, this huge amount of data is stored in
physical memory in the form of files.
What is a File?
A file is named a collection of related information that is recorded on
secondary storage such as magnetic disks, magnetic tapes, and optical disks.
What is File Organization?
File Organization refers to thelogical relationships among various records
that constitute the file, particularly with respect to the means of identification and
access to any specific record. In simple terms, Storing the files in a certain
order is called File Organization.
File Structure refers to the format of the label and data blocks and of any
logical control record.
The Objective of File Organization
• It helps in the faster selection of records i.e. it makes the process faster.
• Different Operations like inserting, deleting, and updating different
records are faster and easier.
• It prevents us from inserting duplicate records via various operations.
• It helps in storing the records or the data very efficiently at a minimal cost

Types of File Organizations


Various methods have been introduced to Organize files. These particular
methods have advantages and disadvantages on the basis of access or selection.
Thus it is all upon the programmer to decide the best-suited file Organization
method according to his requirements.
Some types of File Organizations are:
 Sequential File Organization
 Heap File Organization
 Hash File Organization
 B+ Tree File Organization
 Clustered File Organization
 ISAM (Indexed Sequential Access Method)

We will be discussing each of the file Organizations in further sets of this


article along with the differences and advantages/ disadvantages of each file
Organization method.

21
EMERGING TECHNOLOGIES IN DATA PROCESSING

Sequential File Organization


The easiest method for file Organization is the Sequential method. In this
method, the file is stored one after another in a sequential manner. There are two
ways to implement this method:

1. Pile File Method


This method is quite simple, in which we store the records in a sequence i.e.
one after the other in the order in which they are inserted into the tables.

Pile File Method


Insertion of the new record: Let the R1, R3, and so on up to R5 and R4 be
four records in the sequence. Here, records are nothing but a row in any
table. Suppose a new record R2 has to be inserted in the sequence, then it is
simply placed at the end of the file.

2. Sorted File Method


In this method, As the name itself suggests whenever a new record
has to be inserted, it is always inserted in a sorted (ascending or descending)
manner. The sorting of records may be based on any primary key or any
other key.

Sorted File Method


Insertion of the new record: Let us assume that there is a preexisting sorted
sequence of four records R1, R3, and so on up to R7 and R8. Suppose a new

22
EMERGING TECHNOLOGIES IN DATA PROCESSING

record R2 has to be inserted in the sequence, then it will be inserted at the


end of the file and then it will sort the sequence.

new Record Insertion

Advantages of Sequential File Organization


 Fast and efficient method for huge amounts of data.
 Simple design.
 Files can be easily stored in magnetic tapes i.e. cheaper storage
mechanism.
Disadvantages of Sequential File Organization
 Time wastage as we cannot jump on a particular record that is required,
but we have to move in a sequential manner which takes our time.
 The sorted file method is inefficient as it takes time and space for
sorting records.
Heap File Organization
Heap File Organization works with data blocks. In this method, records are
inserted at the end of the file, into the data blocks. No Sorting or Ordering is
required in this method. If a data block is full, the new record is stored in some
other block, Here the other data block need not be the very next data block, but it
can be any block in the memory. It is the responsibility of DBMS to store and
manage the new records.

Heap File Organization


Insertion of the new record: Suppose we have four records in the heap R1,
R5, R6, R4, and R3, and suppose a new record R2 has to be inserted in the heap

23
EMERGING TECHNOLOGIES IN DATA PROCESSING

then, since the last data block i.e data block 3 is full it will be inserted in any of the
data blocks selected by the DBMS, let’s say data block 1.

New Record Insertion


If we want to search, delete or update data in the heap file Organization we will
traverse the data from the beginning of the file till we get the requested record.
Thus if the database is very huge, searching, deleting, or updating the record will
take a lot of time.
Advantages of Heap File Organization
 Fetching and retrieving records is faster than sequential records
but only in the case of small databases.
 When there is a huge number of data that needs to be loaded into
the database at a time, then this method of file Organization is best suited.
Disadvantages of Heap File Organization
 The problem of unused memory blocks.
 Inefficient for larger databases.
Primary Indexes
A primary index is a unique identifier for each row in a table. It is usually a
column or a combination of columns that cannot have duplicate or null values. A
primary index ensures data integrity and enables fast retrieval of records by their
key values. For example, you can use a primary index on a customer_id column to
quickly find the details of a specific customer

Secondary indexes
A secondary index is an additional index that is not part of the primary
key. It can be created on any column or expression that is frequently used in
queries or joins. A secondary index can have duplicate or null values, and it can be
either unique or non-unique. A secondary index can improve the performance of
queries that filter, sort, or group by the indexed column or expression. For
example, you can use a secondary index on a last_name column to speed up
queries that search for customers by their last name

24
EMERGING TECHNOLOGIES IN DATA PROCESSING

B+ tree Index
o The B+ tree is a balanced binary search tree. It follows a multilevel index
format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that
all leaf nodes remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+
tree can support random access as well as sequential access.

Structure of B+ Tree
In the B+ tree, every leaf node is at equal distance from the root node.
The B+ tree is of the order n where n is fixed for every B+ tree. o It contains an
internal node and leaf node.

Internal node
An internal node of the B+ tree can contain at least n/2 record pointers except
the root node. o At most, an internal node of the tree contains n pointers.
Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers
and n/2 key values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to
next leaf node.
Searching a record in B+ Tree

Suppose we have to search 55 in the below B+ tree structure. First, we will


fetch for the intermediary node which will direct to the leaf node that can contain a
record for 55.

So, in the intermediary node, we will find a branch between 50 and 75


nodes. Then at the end, we will be redirected to the third leaf node. Here DBMS
will perform a sequential search to find 55.

25
EMERGING TECHNOLOGIES IN DATA PROCESSING

B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the
3rd leaf node after 55. It is a balanced tree, and a leaf node of this tree is already
full, so we cannot insert 60 there.
In this case, we have to split the leaf node, so that it can be inserted into tree
without affecting the fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node
is 50. We will split the leaf node of the tree in the middle so that its balance is not
altered. So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from
50. It should have 60 added to it, and then we can have pointers to a new leaf node.

This is how we can insert an entry when there is overflow. In a normal


scenario, it is very easy to find the node where it fits and then place it in that leaf
node.

B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have
to remove 60 from the intermediate node as well as from the 4th leaf node too. If
we remove it from the intermediate node, then the tree will not satisfy the rule of
the B+ tree. So we need to modify it to have a balanced tree.

After deleting node 60 from above B+ tree and re-arranging the nodes, it will
show as follows:

26
EMERGING TECHNOLOGIES IN DATA PROCESSING

ashing I
Hashing is a DBMn DBMS
Hashing is a DBMS technique for searching for needed data on the disc
without utilising an index structure. The hashing method is basically used to index
items and retrieve them in a DB since searching for a specific item using a shorter
hashed key rather than the original value is faster.

The data block addresses are the same as the primary key value in the picture
above. This hash function could alternatively be a simple mathematical function,
such as exponential, mod, cos, sin, and so on. Assume we’re using the mod (5) hash
function to find the data block’s address. In this scenario, the primary keys are
hashed with the mod (5) function, yielding 3, 3, 1, 4, and 2, respectively, and
records are saved at those data block locations.

27
EMERGING TECHNOLOGIES IN DATA PROCESSING

Hash Organization
 Bucket – A bucket is a type of storage container. Data is stored in
bucket format in a hash file. Typically, a bucket stores one entire disc block, which
can then store one or more records.
 Hash Function – A hash function, abbreviated as h, refers to a
mapping function that connects all of the search-keys K to that address in which the
actual records are stored. From the search keys to the bucket addresses, it’s a
function.
Types of Hashing
Hashing is of the following types:

Static Hashing
Whenever a search-key value is given in static hashing, the hash algorithm
always returns the same address. If the mod-4 hash function is employed, for
example, only 5 values will be generated. For this function, the output address must
always be the same. At all times, the total number of buckets available remains
constant. Click here to learn more about static hashing.
Dynamic Hashing
The disadvantage of static hashing is that it doesn’t expand or contract
dynamically as the database size grows or diminishes. Dynamic hashing is a
technology that allows data buckets to be created and withdrawn on the fly.
Extended hashing is another name for dynamic hashing.

Multi-dimensional indexes
A multi-dimensional index maps multi-dimensional data in the form of
multiple numeric attributes to one dimension while mostly preserving locality so
that similar values in all of the dimensions remain close to each other in the
mapping to a single dimension. Queries that filter by multiple value ranges at once
can be better accelerated with such an index compared to a persistent index.

28
EMERGING TECHNOLOGIES IN DATA PROCESSING

Bitmap Indexing in DBMS


Bitmap Indexing is a data indexing technique used in database management
systems (DBMS) to improve the performance of read-only queries that involve
large datasets. It involves creating a bitmap index, which is a data structure that
represents the presence or absence of data values in a table or column.
Bitmap Index Structure
A bitmap is the combination of two words: bit and map. A bit can be termed
as the smallest unit of data in a computer and a map can be termed as a way of
organizing things.
Bit: A bit is a basic unit of information used in computing that can have only
one of two values either 0 or 1. The two values of a binary digit can also be
interpreted as logical values true/false or Yes/No.
Bitmap Indexing is a special type of database indexing that uses bitmaps.
This technique is used for huge databases when the column is of low cardinality
and these columns are most frequently used in the query.
Bitmap indexing is a data structure used in database management systems
(DBMS) to efficiently represent and query large datasets with many attributes
(columns). Bitmap indexes use a compact binary representation to store the
occurrence of each value or combination of values in each attribute, allowing for
fast, set-based operations.
Features of Bitmap Indexing in DBMS
• Space efficiency: Bitmap indexes are highly space-efficient because they
use a compact binary representation to store the occurrence of each value
or combination of values in each attribute. This makes them especially
useful for large datasets with many attributes.
• Fast query processing: Bitmap indexes can be used to quickly answer
complex queries involving multiple attributes using set-based operations
such as AND, OR, and NOT. This allows for fast query processing and
reduces the need for full table scans.
• Low maintenance overhead: Bitmap indexes require relatively low
maintenance overhead because they can be updated incrementally as data
changes. This makes them especially useful for applications where the
data is frequently updated.
• Flexibility: Bitmap indexes can be used for both numerical and
categorical data types, and can also be used to index text data using
techniques such as term frequency-inverse document frequency (TFIDF).
• Reduced I/O overhead: Bitmap indexes can be used to avoid expensive
I/O operations by using a compressed representation of the data. This
reduces the amount of data that needs to be read from the disk,
improving query performance.
29
EMERGING TECHNOLOGIES IN DATA PROCESSING

• Ideal Choice: Bitmap indexing is a powerful technique for efficiently


querying large datasets with many attributes. It’s compact representation
and set-based operations make it an ideal choice for data warehousing
and other applications where fast query processing is critical.

30

You might also like