0% found this document useful (0 votes)
12 views46 pages

FS Mod3

file structure module 3 ppt

Uploaded by

chayashreeg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views46 pages

FS Mod3

file structure module 3 ppt

Uploaded by

chayashreeg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Module 3

Consequential Processing and the


Sorting of Large Files
Consequential Operation
• Coordinated processing of two or more sequential lists to produce a single list
• Kinds of operations
• merging, or union
• matching, or intersection
• combination of above
Matching names in 2 lists
• So called “intersection operation”
• Output the names common to two lists
• Things that must be dealt with to make match procedure work reasonably
• initializing that needs to arrange things
• Getting and accessing the next list item
• synchronizing between two lists
• handling EOF conditions
• recognizing errors
e.g. duplicate names or names out of sequence
04/10/2024 dept. of ISE 2
04/10/2024 dept. of ISE 3
04/10/2024 dept. of ISE 4
04/10/2024 dept. of ISE 5
Merging two lists
• Based on matching operation
• Difference
• must read each of the lists completely
• must change MoreNames behavior
• keep this flag set to true as long as there are records in either list
• HighValue
• the special value (we use “\xFF”)
• come after all legal input values in the files to ensure both input files are read to completion

04/10/2024 dept. of ISE 6


04/10/2024 dept. of ISE 7
04/10/2024 dept. of ISE 8
Multiway Merging
• multiway merges or K-way merge are a specific type of sequence
merge algorithms.
• Takes k sorted lists and merging them into a single sorted list.
• Algorithm
• begin loop
• determine which list has the key with the lowest value
• output that key
• move ahead one key in that list
• in duplicate input entries, move ahead in each list
• loop again
04/10/2024 dept. of ISE 9
10/04/2024 dept. of ISE 10
A Second Look at Sorting in Memory
• Read the whole file from into memory,
• perform sorting,
• write the whole file into disk
• Using Heap technique!
• processing and I/O can occur in parallel
• keep all the keys in heap
• Heap building while reading a block
• Heap rebuilding while writing a block

04/10/2024 dept. of ISE 11


• Heap
• a kind of binary tree,
complete binary tree
• each node has a single key,
that key is less than or equal
to the key at its parent node
• storage for tree can be
allocated sequentially
 so there is no need for
pointers or other dynamic
overhead for maintaining the
heap

04/10/2024 dept. of ISE 12


04/10/2024 dept. of ISE 13
Chapter 9

Multilevel Indexing and B-Trees


Content
Introduction
9.2 Statement of the Problem
9.3 Indexing with Binary Search Trees
• AVL Trees,
• Paged Binary Trees,
• Problems with
• Paged Tress

04/10/2024 dept. of ISE 15


Introduction: The Invention of
the B-tree
File Organization for faster access:
• Fields
• Records/ Blocks
• Data compression
• Indexing
• How to access and efficiently maintain an index that is too large to
hold in memory.
• R. Bayer and E. McCreight

04/10/2024 dept. of ISE 16


Statement of problem
• When indexes grow too large they must be stored on secondary
storage.
• Problems with keeping index in secondary file are:
1. searching the index must be faster than binary searching
2. insertion and deletion must be fast as search.

04/10/2024 dept. of ISE 17


Indexing with Binary Search
Trees
 Advantages
 Data may not be physically sorted
 Good performance on balanced tree
 Insert cost = search cost
 Disadvantages
 In out-of-balance binary tree, more seeks are required
Solution: AVL Tree and paged binary tree.

04/10/2024 dept. of ISE 18


AX, CL, DE, FB, FT, HN, JD, KF, NR, PA, RF, SD, TK, YJ

KF At most 4 seeks/one record

Binary search tree


representation

FB SD

CL HN PA WS

AX DE FT JD NR RF TK YJ

04/10/2024 dept. of ISE 19


Internal Representation of Binary Tree: With RRN(fixed length record) or pointer
No need of arranging the keys.
key left right key left right

0 FB 10 8 8 HN 7 1

ROOT 9 1 JD 9 KF 0 3

2 RF 10 CL 4 12

3 SD 6 13 11 NR

4 AX 12 DE

5 YJ 13 WS 14 5

6 PA 11 2 14 TK

7 FT
04/10/2024 dept. of ISE 20
With the insertion of following key to previous tree : LV NP MB TM UF ND TS NK

KF

FB SD
CL HN PA WS

AX DE FT JD NR RF TK YJ
LV
LA NP
MB
ND
NK
04/10/2024 dept. of ISE 21
AVL Tree
• A height-balanced k tree ( HB(k) tree)
•Allowable difference in the height of any two sub-tree is k
AVL tree is a self-balancing Binary Search Tree (BST) where the difference between
heights of left and right subtrees cannot be more than one for all nodes.

• AVL Tree : HB(1) Tree


• G.M. Adel’son, Vel’skii, E.M. Landis
• Maintenance overhead is needed
• Performance
• Given N keys, worst-case search => 1.44 log2(N+2)
cf. Completely balanced AVL tree : worst-case search => log2(N+1)

04/10/2024 dept. of ISE 22


10/04/2024 dept. of ISE 23
BCGEFDA

Binary searching requires too many seeks


Keeping an index in sorted order is expensive
04/10/2024 dept. of ISE 24
10/04/2024 dept. of ISE 25
PAGED BINARY TREE

• AVL tree address the problem of keeping an index in sorted order cheaply
but not on binary search seek.
• Paged binary tree attempts to address the problem by locating multiple
binary nodes on the same disk page.
• Page is a unit of disk I/O for handling seek and transfer of disk data
• Paged Binary Tree: Divide a binary tree into pages and then store each page
in a block of contiguous locations on disk.
• If every page holds 7 keys, 511 nodes(keys) in only three seeks

04/10/2024 dept. of ISE 26


10/04/2024 dept. of ISE 27
The Problem with Paged Trees
• A significant amount of the space in the node is still being wasted.
• How to build an paged tree? (top down approach)
• Only valid when we have the entire set of keys in hand before the tree
is built
• Problems due to out of balance
• How to select a good separator
• How to group keys
• How to guarantee the maximum loading

04/10/2024 dept. of ISE 28


04/10/2024 dept. of ISE 29
Multilevel Indexing
• Approach as simple index record
• limited on the number of keys allowed
• Approach as multirecord index
• consists of a sequence of simple index records
• binary search is too expensive
• Approach as multilevel index
• reduced the number of records to be searched
• speed up the search
04/10/2024 dept. of ISE 30
10/04/2024 dept. of ISE 31
• How can we insert new keys into the multilevel index?
• The index records in some level might be full
• The several levels of indexes might be rebuilt
• Overflow chain may be helpful, but still ugly

• Multi-level index structure is not strong in dynamic


data processing applications

• B-tree will give you the right solution!

04/10/2024 dept. of ISE 32


04/10/2024 dept. of ISE 33
SEARCH
• Searching procedure
• iterative
• work in two stages
operating alternatively on entire pages (Class BTree)
and then within pages (Class BTreeNode)
• Step1: Loading a page into memeory
• Step 2: Searching through a page, looking for the key along the tree
until it reaches the leaf level
04/10/2024 dept. of ISE 34
04/10/2024 dept. of ISE 35
INSERTION
• Observations of Insertion, Splitting, and Promotion
• proceed all the way down to the leaf level
• after finding the insertion location at the leaf level, the work proceeds upward from the
bottom
• Iterative procedure as having three phases
Search to the leaf level, using FindLeaf method
Insertion, overflow detection, and splitting on the upward path
Creation of a new root node, if the current root was split

04/10/2024 dept. of ISE 36


10/04/2024 dept. of ISE 37
• (Step 1) Locate node on bottom most level in which to insert record. Location is
determined by key search.
• (Step 2) If vacant record slot is available, insert the record so that key sequencing
is maintained. Then, update the pointer associated with the record (Pointer is null
for level 0 records). Then Stop!
• (Step 3) If no vacant record slot exists, identify median record. All records and
pointers to the left of the median records are stored in one node (the original)
and those to the right are stored in another node(the new node).
• (Step 4) If the topmost node was split, create a new topmost node which contains
the median record identified in Step 3, filled with pointers to the original and
split nodes. Update the root node to point to the new topmost node. Then Stop!
• (Step 5) If topmost node was not split, prepare to insert median record identified
in Step 3 and a pointer to the new node (created in Step 3). Then Goto Step 2.
• Note : Step 4 makes B-tree increase in height by 1 level

04/10/2024 dept. of ISE 38


Method Create
writes the empty root node into the file BTreeFile so that its first record is
reserved for that root node
Method Open
opens BTreeFile and load the root node into memory from the first record
in the file
Method Close
simply
04/10/2024
stores the node into BTreeFile and close it
dept. of ISE 39
04/10/2024 dept. of ISE 40
• Method Create
• writes the empty root node into the file BTreeFile so that its first record is reserved
for that root node
• Method Open
• opens BTreeFile and load the root node into memory from the first record in the file
• Method Close
• simply stores the node into BTreeFile and close it

04/10/2024 dept. of ISE 41


B-Tree Properties
• The properties of a B-tree of order m

04/10/2024 dept. of ISE 42


Deletion

10/04/2024 dept. of ISE 43


10/04/2024 dept. of ISE 44
Properties of B* Tree
• Every page has a maximum of m descendants.
• Every page except for the root has at least [(2m - 1 )/3 ]
descendants.
• The root has at least two descendants ( unless it is a leaf).
• All the leaves appear on the same level.

B* tree is a special type of B-tree in which each node is at least


two-thirds full.
04/10/2024 dept. of ISE 45
Virtual B-tree
• Create a page buffer to hold some number of B-tree pages, holding the root
page in memory.
• A B-tree that uses a memory buffer in this way is sometimes referred to as a
virtual B-tree.
• The process of accessing the disk to bring in a page that is not already in the
buffet is called a page fault.
• There are two causes for page faults:
1. We have never used the page.
2. It was once in the buffer but has since been replaced with a new page- LRU
• LRU method keeps track of the requests for pages
• The page to be replaced is the one that has gone the longest time without a
request for use.
04/10/2024 dept. of ISE 46

You might also like