0% found this document useful (0 votes)

23 views32 pages

CNG351 Lecture 11 Part 2

Uploaded by

berayseray382

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views32 pages

CNG351 Lecture 11 Part 2

Uploaded by

berayseray382

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Disk Storage, Basic File Structures and

Hashing

CNG351 - Data Management and File Structures

Lecture - 11
Instructor: Dr. Yeliz Yesilada
Outline
• Disk Storage Devices
• Files of Records
• Operations on Files
• Unordered Files
• Ordered Files
• Hashed Files
– Dynamic and Extendible Hashing Techniques
• RAID Technology

CNG 351 - lecture 11 2/32

File Organisation and Access Methods
• A file organisation refers to the organisation of the
data of a file into records, blocks and access
structures; this includes the way records and blocks
are placed on the storage medium and interlinked.
• An access method, on the other hand, provides a
group of operations that can be applied to a file.
• Several access methods can be applied to a file
organisation.

CNG 351 - lecture 11 3/32

Types of File Organisation
• Heap (unordered) files:
– Records are placed in no particular order
• Sequential (ordered) files:
– Records are ordered by the value of a specified field.
• Hash files:
– Records are placed on a disk according to a hash
function.

CNG 351 - lecture 11 4/32

(Heap) Unordered Files
• Also called a heap or a pile file.
• Insertion: New records are inserted at the end of the file which is very
efficient.
• Searching: A linear search through the file records is necessary to
search for a record.This requires reading and searching half the file
blocks on the average, and is hence quite expensive.
• Deletion:
– To delete a record, the required block has to be retrieved,
the record is marked as deleted, and the block is written
back to the disk.
– Can have an extra byte or bit, called a deletion marker, stored with
each record.
– Both of these require periodic reorganisation of the file to reclaim
the unused spaced of deleted records.

CNG 351 - lecture 11 5/32

Heap Files
• Reading the records in order of a particular field requires sorting
the file records.
• We can use either spanned or unspanned organisation for
unordered file and may be used with either fixed or variable
length records.

CNG 351 - lecture 11 6/32

Ordered Files
• Also called a sequential file.
• File records are kept sorted by the values of an ordering field. If the
ordering field is also the key field, then the field is called the
ordering key.
• Insertion: records must be inserted in the correct order so it is
expensive
– It is common to keep a separate unordered overflow (or
transaction) file for new records to improve insertion efficiency;
this is periodically merged with the main ordered file.
• Deletion: is an expensive operation
• Searching: A binary search can be used to search for a record on
its ordering field value.
– This requires reading and searching log2 of the file blocks on
the average, an improvement over linear search.
– Reading the records in order of the ordering field is quite
efficient. CNG 351 - lecture 11 7/32
Binary Search
• Retrieve the mid-block of the file. Check whether the required
record is in this block. If it is then no need to retrieve another
block.
• If the value of the key field in the first record on the block is is
greater than the required value, the required value if it exists,
occurs on an earlier page. Therefore, we repeat the above steps
using the lower half of the file as the new search area.
• If the value of the key field in the last record on the page is less
than the required value, the required value occurs on a later
page, and so we repeat the above steps using the top half of the
file as the new search area.

CNG 351 - lecture 11 8/32

Binary Search Example

SELECT *
FROM Staff
WHERE staffNo = ‘SG21’;

Block: 1 2 3 4 5 6
SA1 SG5 SG14 SG21 SL37 SL41

(1) (3) (2)

CNG 351 - lecture 11 9/32

Advantages of Ordered Records
1. Reading the records in order of the ordering key values
becomes extremely efficient;
2. Finding the next record from the current one in order of the
ordering key usually requires no additional block access;
3. Using a search condition based on the value of ordering key
field results in faster access when the binary search technique
is used.
However,
Ordering doesn’t provide any advantages for random or ordered
access of the records based on the values of the other non-
ordering fields of the file.

CNG 351 - lecture 11 10/32

Ordered Files
Example

CNG 351 - lecture 11 11/32

Average Access Times
• The following table shows the average access time to
access a specific record for a given type of file
• Average access time for a file of b Blocks under
Basic file organizations:

CNG 351 - lecture 11 12/32

Hash File
• Records do not need to be written sequentially to the file;
• A hash function calculates the address of the block in which the
record is to be stored based on one or more fields in the record.
– Division remainder hashing function: Uses the MOD function, which
takes the field value, and uses the remainder as the address.
• The base field is called the hash key.
• Records in a hash file will appear to be randomly distributed across the
available file space. For this reason, they are sometimes called
random, or direct files.
• The problem with most hashing functions is that they do not guarantee
a unique address because the number of possible a hash field can take
is typically much larger than the number of available address of
records.

CNG 351 - lecture 11 13/32

Hash Files
• The file blocks are divided into M equal-sized buckets, numbered
bucket0, bucket1, ..., bucketM-1.
– Typically, a bucket corresponds to one (or a fixed number of) disk
block.
• One of the file fields is designated to be the hash key of the file.
• The record with hash key value K is stored in bucket i, where i=h(K),
and h is the hashing function.
• Within a bucket, records are placed in the order of arrival.
• Collisions occur when a new record hashes to a bucket that is already
full. There are several methods that can be used to manage collisions:
– Open addressing;
– Unchanged workflow;
– Chained workflow;
– Multiple hashing.

CNG 351 - lecture 11 14/32

1. Open Addressing
• Open addressing: Proceeding from the occupied position specified
by the hash address, the program checks the subsequent positions
in order until an unused (empty) position is found.
• For example:
– Hash function: MOD 3 of staff number field
– Therefore, SG5 and SG14 hash to bucket 2.
– When SL41 is inserted, generates an address to bucket 2;
– Cannot add 2 as it is full, so it searches from top an available
space.
Before Bucket After Bucket
Staff SA9 record Staff SA9 record
Staff SL21 record 0 Staff SL21 record 0
Staff SG37 record Staff SG37 record
1 Staff SL41 record 1
Staff SG5 record Staff SG5 record
Staff SG14 record 2 351 - lecture 11Staff SG14 record
CNG 2 15/32
2. Unchained overflow
• Instead of searching for a free slot, an overflow area is maintained for
collisions that cannot be placed at the hash address.

Before Bucket Overflow area Bucket

Staff SA9 record Staff SL41 Record
Staff SL21 record 0 3
Staff SG37 record
1 4
Staff SG5 record
Staff SG14 record 2

CNG 351 - lecture 11 16/32

3. Chained Overflow
• An overflow area is maintained for collisions that
cannot be placed at the hash address. However,
each bucket has an additional field that indicates
whether a collision occurred, and if so, points to the
overflow page used.
Before Bucket Overflow area Bucket
Staff SA9 record Staff SL41 Record
Staff SL21 record 0 0 3
Staff SG37 record
0 1 4
Staff SG5 record
Staff SG14 record 3 2
CNG 351 - lecture 11 17/32
Chained Overflow Example

CNG 351 - lecture 11 18/32

4. Multiple Hashing
• The program applies a second hash function if the first results in a
collision. If another collision results, the program uses open addressing
or applies a third hash function and then uses open addressing if
necessary.

CNG 351 - lecture 11 19/32

Hashed Files
• To reduce overflow records, a hash file is typically kept 70-80%
full.
• The hash function h should distribute the records uniformly
among the buckets
– Otherwise, search time will be increased because many
overflow records will exist.
• Main disadvantages of static hashing (hash address space if
fixed when the file is created):
– Fixed number of buckets M is a problem if the number of
records in the file grows or shrinks.
– Ordered access on the hash key is quite inefficient (requires
sorting the records).
– It is difficult to expand or shrink the file dynamically.

CNG 351 - lecture 11 20/32

Dynamic And Extendible Hashed Files
• Dynamic and Extendible Hashing Techniques
– Hashing techniques are adapted to allow the dynamic
growth and shrinking of the number of file records.
– These techniques include the following: dynamic
hashing, extendible hashing, and linear hashing.
• Both dynamic and extendible hashing use the binary
representation of the hash value h(K) in order to
access a directory.
– In dynamic hashing the directory is a binary tree.
– In extendible hashing the directory is an array of size
2d where d is called the global depth.

CNG 351 - lecture 11 21/32

Dynamic And Extendible Hashing
• The directories can be stored on disk, and they expand or
shrink dynamically.
– Directory entries point to the disk blocks that contain the
stored records.
• An insertion in a disk block that is full causes the block to split
into two blocks and the records are redistributed among the two
blocks.
– The directory is updated appropriately.
• Dynamic and extendible hashing do not require an overflow
area.
• Linear hashing does require an overflow area but does not use
a directory.
– Blocks are split in linear order as the file expands.

CNG 351 - lecture 11 22/32

Extendible Hashing

CNG 351 - lecture 11 23/32

Parallelizing Disk Access using RAID
Technology
• Secondary storage technology must take steps to
keep up in performance and reliability with
processor technology.
• A major advance in secondary storage technology is
represented by the development of RAID, which
originally stood for Redundant Arrays of
Inexpensive (Independent) Disks.
• The main goal of RAID is to even out the widely
different rates of performance improvement of disks
against those in memory and microprocessors.

CNG 351 - lecture 11 24/32

Trends…

• The main goal of RAID is to even out the widely different rates
of performance improvement of disks against those in memory
and microprocessors. CNG 351 - lecture 11 25/32
RAID Technology
• A natural solution is a large array of small
independent disks acting as a single higher-
performance logical disk.
• A concept called data striping is used, which utilizes
parallelism to improve disk performance.
• Data striping distributes data transparently over
multiple disks to make them appear as a single large,
fast disk.

CNG 351 - lecture 11 26/32

Reliability with RAID
• Keeping a single copy of data in a single set of disks
will cause significant loss of reliability.
• An obvious solution is to employ redundancy of data
so that disk failures can be tolerated.
• One technique for introducing redundancy is called
mirroring or shadowing.
• Another technique is to store extra information that is
normally needed but that can be used to reconstruct
the lost information.

CNG 351 - lecture 11 27/32

Performance with RAID
• The disk arrays employ the technique of data
stripping to achieve higher transfer rates.
• Bit-level data stripping: consists of splitting a byte
of data and writing bit j to the jth disk.
– With 8 bits bytes, eight physical disks may be
considered as one logical disk.
• Block-level data stripping: The granularity of
splitting is higher than a bit, blocks of file can be
stripped across disks.

CNG 351 - lecture 11 28/32

RAID Technology & Levels
• Different raid organizations were defined based on different
combinations of the two factors of granularity of data interleaving
(striping) and pattern used to compute redundant information.
– Raid level 0 has no redundant data and hence has the best write
performance at the risk of data loss
– Raid level 1 uses mirrored disks.
– Raid level 2 uses memory-style redundancy by using Hamming
codes, which contain parity bits for distinct overlapping subsets of
components. Level 2 includes both error detection and correction.
– Raid level 3 uses a single parity disk relying on the disk controller
to figure out which disk has failed.
– Raid Levels 4 and 5 use block-level data striping, with level 5
distributing data and parity information across all disks.
– Raid level 6 applies the so-called P + Q redundancy scheme using
Reed-Soloman codes to protect against up to two disk failures by
using just two redundant disks.

CNG 351 - lecture 11 29/32

Use of RAID Technology (contd.)
• Different raid organizations are being used under different situations
– Raid level 1 (mirrored disks) is the easiest for rebuild of a disk from
other disks
• It is used for critical applications like logs
– Raid level 2 uses memory-style redundancy by using Hamming
codes, which contain parity bits for distinct overlapping subsets of
components.
• Level 2 includes both error detection and correction.
– Raid level 3 (single parity disks relying on the disk controller to
figure out which disk has failed) and level 5 (block-level data
striping) are preferred for Large volume storage, with level 3 giving
higher transfer rates.
• Most popular uses of the RAID technology currently are:
– Level 0 (with striping), Level 1 (with mirroring) and Level 5 with an
extra drive for parity.
• Design Decisions for RAID include:
– Level of RAID, number of disks, choice of parity schemes, and
grouping of disks for block-level striping.
CNG 351 - lecture 11 31/32
Summary
• Operations on Files
• Unordered Files
• Ordered Files
• Hashed Files
– Dynamic and Extendible Hashing Techniques
• RAID Technology

CNG 351 - lecture 11 32/32

NX Post-Builder PDF
100% (3)
NX Post-Builder PDF
65 pages
Chapter 17 Disk Storage, Basic File Structures, and Hashing Disk Storage Devices
No ratings yet
Chapter 17 Disk Storage, Basic File Structures, and Hashing Disk Storage Devices
10 pages
Virtual Sound
100% (2)
Virtual Sound
151 pages
File Organization CH16 Updated
No ratings yet
File Organization CH16 Updated
30 pages
Elmasri Storage Hashing
No ratings yet
Elmasri Storage Hashing
27 pages
Disk Storage, Basic File Structures, and Hashing
No ratings yet
Disk Storage, Basic File Structures, and Hashing
34 pages
LM2 File Organisation
No ratings yet
LM2 File Organisation
31 pages
2MCA2 DBMS Nit 2 Secondary Storage. 16960710426030
No ratings yet
2MCA2 DBMS Nit 2 Secondary Storage. 16960710426030
32 pages
File Structures Indexing Kopyası
No ratings yet
File Structures Indexing Kopyası
76 pages
DS TM Study Material Presentations Unit-4 1TM
No ratings yet
DS TM Study Material Presentations Unit-4 1TM
22 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
8.physical Database Design
No ratings yet
8.physical Database Design
20 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
Data and File Structures: Hashing
No ratings yet
Data and File Structures: Hashing
24 pages
File Structures Indexing
No ratings yet
File Structures Indexing
58 pages
File Organization
No ratings yet
File Organization
6 pages
DBMS Chapter 4 Record Organization and Dile Management
No ratings yet
DBMS Chapter 4 Record Organization and Dile Management
36 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
1 - Disk Storage - Ch13
No ratings yet
1 - Disk Storage - Ch13
31 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
1 File Structure & Organization
No ratings yet
1 File Structure & Organization
23 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Chapter 5. Record Storage and Primary File Organization
No ratings yet
Chapter 5. Record Storage and Primary File Organization
18 pages
Data Management: INFO125
No ratings yet
Data Management: INFO125
111 pages
09 FIle
No ratings yet
09 FIle
22 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
Unit 3 File Organization
No ratings yet
Unit 3 File Organization
19 pages
9-Hashing Schemes
No ratings yet
9-Hashing Schemes
23 pages
Lec 03 File Organization
No ratings yet
Lec 03 File Organization
24 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
Hashing
No ratings yet
Hashing
8 pages
$R101OHL
No ratings yet
$R101OHL
17 pages
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
No ratings yet
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
49 pages
DSAD Dynamic Hashing
No ratings yet
DSAD Dynamic Hashing
79 pages
Unit Iv Implementation Techniques
No ratings yet
Unit Iv Implementation Techniques
91 pages
File Organization in DBMS
No ratings yet
File Organization in DBMS
13 pages
ENACh 13 Final
No ratings yet
ENACh 13 Final
34 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Elmasri 6e Ch17 PPT Compatibility Mode Repaired
No ratings yet
Elmasri 6e Ch17 PPT Compatibility Mode Repaired
32 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Chapter - 2 - Disk Storage, Basic File Structures, and Hashing
No ratings yet
Chapter - 2 - Disk Storage, Basic File Structures, and Hashing
71 pages
File Organization
No ratings yet
File Organization
11 pages
File Organization
No ratings yet
File Organization
49 pages
File Organization
No ratings yet
File Organization
45 pages
Elmasri - 6e - Ch17
No ratings yet
Elmasri - 6e - Ch17
43 pages
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
No ratings yet
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
20 pages
Unit Iv
No ratings yet
Unit Iv
6 pages
Presentation ON File Organisation: Submitted To: Mrs. Sonal Beniwal
No ratings yet
Presentation ON File Organisation: Submitted To: Mrs. Sonal Beniwal
23 pages
Unit 4-Hashing
No ratings yet
Unit 4-Hashing
24 pages
Hashing
No ratings yet
Hashing
34 pages
File Organizations and Indexes
No ratings yet
File Organizations and Indexes
51 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
File Organization in DBMS
No ratings yet
File Organization in DBMS
10 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Adbs 5
No ratings yet
Adbs 5
37 pages
Chapter - 8 1 97
No ratings yet
Chapter - 8 1 97
97 pages
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
No ratings yet
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
32 pages
Fs Mod 5 (WWW - Vtuloop.com)
No ratings yet
Fs Mod 5 (WWW - Vtuloop.com)
105 pages
DBMS Unit-3 Notes
No ratings yet
DBMS Unit-3 Notes
9 pages
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
Mastering Shell Commands On Linux
From Everand
Mastering Shell Commands On Linux
Urko Galen
No ratings yet
Zig Programming: From Zero to Systems Master
From Everand
Zig Programming: From Zero to Systems Master
Niklas Hoffmann
No ratings yet
Unofficial SB Mod Ebook 2.2a
No ratings yet
Unofficial SB Mod Ebook 2.2a
42 pages
Unit 24 - Sound Editing
No ratings yet
Unit 24 - Sound Editing
10 pages
Maintenance Guidelines For M-Files Administrators
No ratings yet
Maintenance Guidelines For M-Files Administrators
19 pages
Lab 02-OS-Ushba Fatima
No ratings yet
Lab 02-OS-Ushba Fatima
9 pages
AS400 Tip
100% (1)
AS400 Tip
75 pages
Spycrushers Spy Pen Camera Operating Manual
No ratings yet
Spycrushers Spy Pen Camera Operating Manual
10 pages
Using Oracle8
100% (1)
Using Oracle8
420 pages
Engineering Installation Guide
No ratings yet
Engineering Installation Guide
62 pages
Graphics
No ratings yet
Graphics
5 pages
欢迎来到thg任务分配管理器！
100% (1)
欢迎来到thg任务分配管理器！
6 pages
Beckman DU600 Soft Validations
No ratings yet
Beckman DU600 Soft Validations
18 pages
Gu A Whittle
100% (1)
Gu A Whittle
29 pages
IT Project
No ratings yet
IT Project
44 pages
Word 2016 Notes
No ratings yet
Word 2016 Notes
377 pages
Sudoku Checker
No ratings yet
Sudoku Checker
2 pages
Red Hat System Administration - I RH124: Course Overview
No ratings yet
Red Hat System Administration - I RH124: Course Overview
1 page
Coda File System
No ratings yet
Coda File System
45 pages
Waters SQ Detector 2 Empower Instrument Control Software: Release Notes
No ratings yet
Waters SQ Detector 2 Empower Instrument Control Software: Release Notes
8 pages
ViPlex Handy Media Player Control App User Manual V4.4.0
No ratings yet
ViPlex Handy Media Player Control App User Manual V4.4.0
40 pages
Bareos Manual Main Reference
No ratings yet
Bareos Manual Main Reference
359 pages
Heavier7Strings-1.1.2 User Manual
100% (1)
Heavier7Strings-1.1.2 User Manual
64 pages
Media Backbone Navigatorx: Content Management and Workflow Solution
No ratings yet
Media Backbone Navigatorx: Content Management and Workflow Solution
3 pages
02 - CommVault® Data Management Concepts
No ratings yet
02 - CommVault® Data Management Concepts
22 pages
JV 33 RasterLink
No ratings yet
JV 33 RasterLink
158 pages
Az-800 2
No ratings yet
Az-800 2
35 pages
Morph: - Morph Lets You Transform One Image Into Another
No ratings yet
Morph: - Morph Lets You Transform One Image Into Another
10 pages
Ad Patch
100% (1)
Ad Patch
7 pages
S1 ICT Scheme Term2 CBC Format
No ratings yet
S1 ICT Scheme Term2 CBC Format
4 pages

CNG351 Lecture 11 Part 2

Uploaded by

CNG351 Lecture 11 Part 2

Uploaded by

Disk Storage, Basic File Structures and

CNG351 - Data Management and File Structures

CNG 351 - lecture 11 2/32

CNG 351 - lecture 11 3/32

CNG 351 - lecture 11 4/32

CNG 351 - lecture 11 5/32

CNG 351 - lecture 11 6/32

CNG 351 - lecture 11 8/32

(1) (3) (2)

CNG 351 - lecture 11 9/32

CNG 351 - lecture 11 10/32

CNG 351 - lecture 11 11/32

CNG 351 - lecture 11 12/32

CNG 351 - lecture 11 13/32

CNG 351 - lecture 11 14/32

Before Bucket Overflow area Bucket

CNG 351 - lecture 11 16/32

CNG 351 - lecture 11 18/32

CNG 351 - lecture 11 19/32

CNG 351 - lecture 11 20/32

CNG 351 - lecture 11 21/32

CNG 351 - lecture 11 22/32

CNG 351 - lecture 11 23/32

CNG 351 - lecture 11 24/32

CNG 351 - lecture 11 26/32

CNG 351 - lecture 11 27/32

CNG 351 - lecture 11 28/32

CNG 351 - lecture 11 29/32

CNG 351 - lecture 11 32/32

You might also like