0% found this document useful (0 votes)
16 views23 pages

Part 2 File Organization L1&2

Part 2 file organization L1&2

Uploaded by

Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views23 pages

Part 2 File Organization L1&2

Part 2 file organization L1&2

Uploaded by

Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

PART II

File Organization
Organizing File for Performance
2020
Prof. Mohamed Hashem
Performance
Performance
Efficient Use of Speed “How
the space to find things
quickly”

Performance

Space
Speed

Data Reclaiming
Compression Space
How to increase
the speed of
finding the
required record
Encoding file
information in such
a way that it takes
less space
Contents
Data compression
Reclaiming space in files
Finding things quickly: An Introduction to internal sorting
binary searching
Keysorting
• We will be looking at four different issues:
– Data Compression: how to make files smaller
– Reclaiming space in files that have undergone deletions and
updates
– Sorting Files in order to support binary searching ==> Internal
Sorting
– A better Sorting Method: KeySorting

3
Data Compression
• Reasons for data compression
– less storage
– transmitting faster, decreasing access time
– processing faster sequentially
– Techniques:
1- Using compact notation
2-Using Run Length indicator
3- Using Variable length coding
- Huffman Code

4
1 - Using a different notation
• Fixed-Length fields are good candidates
• Decrease the # of bits by finding a more compact notation
Example:
1- original state field notation is 16bits, but we can encode with 6bit notation
because of the # of all states are 50
• 2- same for Egypt Governorance 30 we can use 5 bits
• 3- for AS FCIS 9 department we can use 4 bits
• Disadvantages:
• . unreadable by human
– cost in encoding time
– decoding modules => increase the complexity of s/w
=> used for particular application

5
Data Compression
2-Run-length encoding algorithm
– read through pixels, copying pixel values to file in sequence, except the
same pixel value occurs more than once in succession
– when the same value occurs more than once in succession, substitute the
following three bytes
special run-length code indicator((ex) ff)
pixel value repeated
the number of times that value is repeated
• ex) 22 23 24 24 24 24 24 24 24 25 26 26 26 26 26 26 25 24
22 23 ff24 07 25 ff 26 06 25 24

Disadvantages
not guarantee any particular amount of space savings
under some circumstances, compressed image is larger than original image
Why? Can you prevent this?

6
Data Compression
3-variable-length codes
• Morse code: oldest & most common scheme of variable-length code
• Some values occur more frequently than others
– that value should take the least amount of space
• Huffman coding
– base on probability of occurrence
• determine probabilities of each value occurring
• build binary tree with search path for each value
• more frequently occurring values are given shorter search paths in
tree

File Structures SNU-OOPSLA Lab. 7


Data Compression
Huffman coding
Problem
you give a file
I A M S A M M y
File :

Find Huffman tree ?


•Size of tree=10*8=80 byte
Put According to frequency (ascending), then according to character (alphabetic).

Character A I M S Y /b

Frequenc 2 1 3 1 1 2
y
Code 00 1010 11 1011 100 01
# bits 2*2 1*4 3*2 1*4 1*3 2*2

8
Data Compression
Huffman coding
Put According to frequency (ascending), then according to character
alphabetic).

1I: 1S: 1Y: 2a: 2b: / 3M:

Add the two smallest values, and put it at the beginning of sequence of numbers•

1Y: 2 2b: / 3M:


2a:
1I: 1S:

And continue up to the end tree as shown•


For each node assign 1 to right edge & 0 for left edge•
9
Huffman coding
10
Final Tree:
0 1

4
6

0 1 0 1

A=2 /b=2 3 M-3

0 1

2
Y=1
0 1

I=1 S=1

Given File (2) - - > 1 1 1 0 1 1 0 0 1 0 1 0 1 1 0 0•


The original file was - - > M S A I M A•
for the string SYMA 1011 100 11 00
Reclaiming Space
Basic File Operations:
•Open.
•Close
•Read There is NO delete Operation
•Write
•Append.

Record deletion Operation


1. Use special character at the beginning of the deleted record.
2. Make a small software that when reading consider the record (which have
symbol at its beginning) as deleted.
3. Put the deleted record in AVAIL List to keep track of deleted records

AVAIL List: Available List

11
Reclaiming Space
Record Deletion:

Record Deletion
Fixed length Variable length
record record

•Deletion is the same for both Fixed and Variable Length Records.
•Addition differs from fixed Length Records to Variable Length Records.
•RRN (Relative Record Number) - - > Order of the Record.
•Offsite byte is the address of first byte of the record

RRN 1 2 3 4 5 6 7
If we want 5 = (RRN-1) * Record length
= 4 * 100
= 400 Bytes
12
Reclaiming Space
Fixed-Length Record Deletion & Addition:
1-Delition :
AVAIL List
Given the following file: We need to delete
REC # 3,5,6 in order
1-Before deleting - - >AVAIL List =-1
AVAIL List =-1 H.L
-1
1 2 3 4 5 6 7
2-After Deleting Rec 3
AVAIL List =3 H.L
3
1 2 *1 4 9 8 7 -1

3-After Deleting Rec 5


H.L
AVAIL List =5 5
3
1 2 *1 4 *3 8 7 -1

4-After Deleting Rec 6 H.L


6
AVAIL List =6 5
3
1 2 *-1 4 *3 *5 7 -1
13
Reclaiming Space
Fixed-Length Record Deletion & Addition:
Addition :
AVAIL List
We need to add REC # 8,9,10,11,12 in order H.L
1-Initially the file : 6
5
AVAIL List = 6
3
1 2 *-1 4 *3 *5 7 -1

2-After Adding Rec 8 H.L


AVAIL List =5 5
3
1 2 *1 4 *3 8 7 -1

3-After Adding Rec 9


AVAIL List =3 H.L
3
1 2 *1 4 9 8 7 -1

4-After Deleting Rec 10


AVAIL List =-1
H.L -1
1 2 10 4 9 8 7

4-After Deleting Rec 11 &12


AVAIL List =-1
H.L
-1 14
1 2 10 4 9 8 7 11 12
Reclaiming Space
Variable-Length Record Deletion & Addition:
1-Delition :
Given the following file: We need to delete REC # 2,1,4 in order AVAIL List

1-Before deleting - - >AVAIL List =-1


AVAIL List =-1 Offset size pointer
0 1 2 3 4 H.L

10B 5B 10B 8B 30B


2-After Deleting Rec 2 Offset1 Size pointer
AVAIL List =16
H.L
0 1 * -1 3 4 16 10 -1
10B 5B 10B 8B 30B
3-After Deleting Rec 1 Off set Size pointer
AVAIL List =11 H.L
0 * 16 * -1 3 4 11 5 16
16 10 -1
10B 5B 10B 8B 30B

Offset size pointer


4-After Deleting Rec 4
H.L
AVAIL List =34 34 30 11
0 * 16 * -1 3 * 34 11 5 16
15
10B 5B 10B 8B 30B 16 10 -1
Reclaiming Space
Variable-Length Record Deletion & Addition:
2-Addition:
In adding records, search through avail list for right size & insert
record according the right selected fitting strategy
Placement Strategies:
• First-fit
– select the first available record slot that can accommodate the new
record.
– suitable when lost space is due to internal fragmentation
• Best-fit
– Select the first available smallest available record slot that can
accommodate the new record
– avail list in ascending order
– suitable when lost space is due to internal fragmentation
• Worst-fit
– select the largest available record slot
– avail list in descending order
– suitable when lost space is due to external fragmentation
Reclaiming Space
Variable-Length Record Deletion & Addition: AVAIL List
Offset size pointer
2-Addition: We want to add following records in order:
H.L
with First Fit: R5 of size 40 bytes, R6 10 Bs, and R7 of 7 Bs 34 30 11
1-Initially The file AVAIL List =34
0 * 16 * -1 3 * 11 11 5 16
2-After Adding Rec 5 10B 5B 10B 8B 30B
16 10 -1
AVAIL List =34

Off set Size pointer


10B 5B 10B 8B 30B 40B
3-After Deleting Rec 6 H.L
AVAIL List =11 11 5 16
0 *16 *-1 3 R6 (10B) R5 (40B) 16 10 -1

10B 5B 10B Offset1 Size pointer


4-After Deleting Rec 7
H.L
AVAIL List =11
11 5 -1
0 *16 *-1 3 R6 (10B) R5 (40B)

Offset1 Size pointer


5- Final File
AVAIL List =11 H.L
0 *16 *-1 3 R6 (10B) R5 (40B)
11 5 -1
17
Reclaiming Space
Exercise:
Apply Best-fit & worst -fit strategies to the problem above

File Fragmentation:
Due to the dynamic nature of file( deleting and adding) leads to logical
fragmentation. * Internal fragmentation
* External fragmentation
Internal External

Where - within a record - Between records

Av. List doesn’t appear in AVAIL list


appears in AVAIL
occurs when - adding small record size - deleting any record
in a larger size deleted
record

Physical fragmentation: Sector does not equal integer number of records


Cluster does not equal integer number of files
18
Reclaiming Space
How to Combat File Fragmentation:

Combating
Fragmentation

Static
Passive Dynamic
•Storage Compaction
Choosing reasonable “Squeeze out the unused •Right fitting strategy
record size (Internal spaces”: (Internal & external) “Reclaiming space”(external)
& external) •Collecting holes “collecting •Procedures to decompose
adjacent deleted records to size “convert internal
have a larger size ” (Internal fragmentation to external one
& external) therefore appears in AVAIL ”
Like: Defragmentation (internal)

19
Finding Things Quickly

• The cost of Seeking is very high.


• This cost has to be taken into consideration when determining a
strategy for searching a file for a particular piece of information.
• The same question also arises with respect to sorting, which often
is the first step to searching efficiently.
• Rather than simply trying to sort and search, we concentrate on
doing so in a way that minimizes the number of seeks.
• Binary search vs. Sequential • Limitations of binary search & internal
search sort
– binary search requires more than one or
– binary search two access
• O(log n) c.f.) single access by RRN
• list is sorted by key – keeping a file sorted is very expensive
– an internal sort works only on small files
– sequential search
• O(n)
Internal Sort

unsorted Read the entire file unsorted sorted


file file file

Sort in memory

disk

memory
File Structures SNU-OOPSLA Lab. 21
Key Sorting

KEY RRN Records


Conceptual HARRISON 1 Harrison|Susan|387 Eastern....
view KELLOG 2 Kellog|Bill|17 Maple....
before HARRIS 3 Harris|Margaret|4343 West....
sorting .
.
.
.
BELL k Bell|Robert|8912 Hill....

In RAM On secondary storage


KEY RRN Records
BELL k Harrison|Susan|387 Eastern....
Conceptual HARRIS Kellog|Bill|17 Maple....
3
view
HARRISON 1 Harris|Margaret|4343 West....
after sorting
.
keys .
in RAM .
.
KELLOG 2 Bell|Robert|8912 Hill....
Index file Original file
Finding Things Quickly
• Why we rewrite the file again to the secondary storage
• If this the case and not rewrite the index it is the IDEXING

You might also like