Part 2 File Organization L1&2
Part 2 File Organization L1&2
File Organization
Organizing File for Performance
2020
Prof. Mohamed Hashem
Performance
Performance
Efficient Use of Speed “How
the space to find things
quickly”
Performance
Space
Speed
Data Reclaiming
Compression Space
How to increase
the speed of
finding the
required record
Encoding file
information in such
a way that it takes
less space
Contents
Data compression
Reclaiming space in files
Finding things quickly: An Introduction to internal sorting
binary searching
Keysorting
• We will be looking at four different issues:
– Data Compression: how to make files smaller
– Reclaiming space in files that have undergone deletions and
updates
– Sorting Files in order to support binary searching ==> Internal
Sorting
– A better Sorting Method: KeySorting
3
Data Compression
• Reasons for data compression
– less storage
– transmitting faster, decreasing access time
– processing faster sequentially
– Techniques:
1- Using compact notation
2-Using Run Length indicator
3- Using Variable length coding
- Huffman Code
4
1 - Using a different notation
• Fixed-Length fields are good candidates
• Decrease the # of bits by finding a more compact notation
Example:
1- original state field notation is 16bits, but we can encode with 6bit notation
because of the # of all states are 50
• 2- same for Egypt Governorance 30 we can use 5 bits
• 3- for AS FCIS 9 department we can use 4 bits
• Disadvantages:
• . unreadable by human
– cost in encoding time
– decoding modules => increase the complexity of s/w
=> used for particular application
5
Data Compression
2-Run-length encoding algorithm
– read through pixels, copying pixel values to file in sequence, except the
same pixel value occurs more than once in succession
– when the same value occurs more than once in succession, substitute the
following three bytes
special run-length code indicator((ex) ff)
pixel value repeated
the number of times that value is repeated
• ex) 22 23 24 24 24 24 24 24 24 25 26 26 26 26 26 26 25 24
22 23 ff24 07 25 ff 26 06 25 24
Disadvantages
not guarantee any particular amount of space savings
under some circumstances, compressed image is larger than original image
Why? Can you prevent this?
6
Data Compression
3-variable-length codes
• Morse code: oldest & most common scheme of variable-length code
• Some values occur more frequently than others
– that value should take the least amount of space
• Huffman coding
– base on probability of occurrence
• determine probabilities of each value occurring
• build binary tree with search path for each value
• more frequently occurring values are given shorter search paths in
tree
Character A I M S Y /b
Frequenc 2 1 3 1 1 2
y
Code 00 1010 11 1011 100 01
# bits 2*2 1*4 3*2 1*4 1*3 2*2
8
Data Compression
Huffman coding
Put According to frequency (ascending), then according to character
alphabetic).
Add the two smallest values, and put it at the beginning of sequence of numbers•
4
6
0 1 0 1
0 1
2
Y=1
0 1
I=1 S=1
11
Reclaiming Space
Record Deletion:
Record Deletion
Fixed length Variable length
record record
•Deletion is the same for both Fixed and Variable Length Records.
•Addition differs from fixed Length Records to Variable Length Records.
•RRN (Relative Record Number) - - > Order of the Record.
•Offsite byte is the address of first byte of the record
RRN 1 2 3 4 5 6 7
If we want 5 = (RRN-1) * Record length
= 4 * 100
= 400 Bytes
12
Reclaiming Space
Fixed-Length Record Deletion & Addition:
1-Delition :
AVAIL List
Given the following file: We need to delete
REC # 3,5,6 in order
1-Before deleting - - >AVAIL List =-1
AVAIL List =-1 H.L
-1
1 2 3 4 5 6 7
2-After Deleting Rec 3
AVAIL List =3 H.L
3
1 2 *1 4 9 8 7 -1
File Fragmentation:
Due to the dynamic nature of file( deleting and adding) leads to logical
fragmentation. * Internal fragmentation
* External fragmentation
Internal External
Combating
Fragmentation
Static
Passive Dynamic
•Storage Compaction
Choosing reasonable “Squeeze out the unused •Right fitting strategy
record size (Internal spaces”: (Internal & external) “Reclaiming space”(external)
& external) •Collecting holes “collecting •Procedures to decompose
adjacent deleted records to size “convert internal
have a larger size ” (Internal fragmentation to external one
& external) therefore appears in AVAIL ”
Like: Defragmentation (internal)
19
Finding Things Quickly
Sort in memory
disk
memory
File Structures SNU-OOPSLA Lab. 21
Key Sorting