0% found this document useful (0 votes)
37 views

File Management and Organization: Adil Yousif, PHD

This document provides an overview of a course on file management and organization. The course will cover various file structures and techniques, including sequential files, hashing, trees, and external sorting. The objectives are to introduce file structure design and advanced data structures for efficient file operations. Students will develop programming skills in C++ and learn how to explain, implement, and analyze different file structures and indexing methods. The document discusses the role of file structures in data storage and manipulation between main memory and secondary storage.

Uploaded by

Samahir Alkleefa
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

File Management and Organization: Adil Yousif, PHD

This document provides an overview of a course on file management and organization. The course will cover various file structures and techniques, including sequential files, hashing, trees, and external sorting. The objectives are to introduce file structure design and advanced data structures for efficient file operations. Students will develop programming skills in C++ and learn how to explain, implement, and analyze different file structures and indexing methods. The document discusses the role of file structures in data storage and manipulation between main memory and secondary storage.

Uploaded by

Samahir Alkleefa
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

File Management and

Organization

Adil Yousif , PhD

Lecture 1
Course Outlines
The Course will cover the following materials:
File concepts,
Basic file operations,
Physical file organization techniques,
Sequential file structures,
Hashing and direct organization structures,
Indexed structures,
List file structures (inverted, multi-key, ect.),
Tree structures (B trees, B+ trees,... etc.),
External sorting techniques,
Searching techniques.

2
Objectives of The Course

To Provide a Solid Introduction to the Topic


of File Structures Design.
To Discuss a number of Advanced Data
Structure Concepts that are necessary for
achieving high efficiency in File Operations.
To Develop important programming skills in
and Object-Oriented Language such as C++
or Java.
Pre-Requisites

Programming with C++.


Learning Outcomes
After completing this course, the student should
demonstrate the knowledge and ability to:
Explain the importance of file structures in the Data
Storage and Manipulation.
Show how various kind of secondary storage devices to
store data.
Know the low level aspects of file manipulation. 
Know types of file structures and indexing techniques.
Implement some of the learned techniques and concepts
using C++ for solving various file management problems.

5
Textbooks
Course Evaluations
Written or programming assignments (20%)
A Mid-Term (20%)
A Final Exam (60%)

7
Course Web page:

Motivation

 Most computers are used for data


processing (over $100 billion/year).
 A big growth area in the “information age”
 This course covers data processing from a
computer science perspective:
 Storage of data
 Organization of data
 Access to data
 Processing of data

CENG 351 9
Data Structures vs. File Structures

Both involve:
 Representation of Data
+
 Operations for accessing data

Difference:
 Data structures: deal with data in main memory
 File structures: deal with data in secondary storage

CENG 351 10
Where do File Structures fit in
Computer Science?
11

Application

DBMS

File system

Operating System

Hardware

CENG 351
Computer Architecture

- Fast, expensive,
data is manipulated Main Memory (RAM)
volatile, small
here

data transfer

- disks, tape
data is
stored here Secondary - Slow,cheap,
Storage stable, large

CENG 351 12
Advantages

 Main memory is fast


 Secondary storage is big (because it is cheap)
 Secondary storage is stable (non-volatile) i.e. data is not
lost during power failures
Disadvantages
 Main memory is small. Many databases are too large to
fit in main memory (MM).
 Main memory is volatile, i.e. data is lost during power
failures.
 Secondary storage is slow (10,000 times slower than
MM)
Memory Hierarchy

15

11/19/20
Memory Hierarchy

As one goes down the hierarchy, the following occur:


a. Decreasing cost per bit
b. Increasing capacity
c. Increasing access time
d. Decreasing frequency of access of the memory by the
processor.
The dilemma facing the designer is clear. The designer would
like to use memory technologies that provide for large-capacity
memory, both because the capacity is needed and because the
cost per bit is low. However, to meet performance requirements,
the designer needs to use expensive, relatively lower-capacity
memories with short access times.
The way out of this dilemma is not to rely on a single memory 16
component or technology, but to employ a memory hierarchy.
11/19/20
Memory Hierarchy

17
How fast is main memory?
Typical time for getting info from:
Main memory: ~12 nanosec = 120 x 10-9 sec
Magnetic disks: ~30 milisec = 30 x 10-3 sec

An analogy keeping same time proportion as above:


Looking at the index of a book : 20 sec
versus
Going to the library: 58 days

CENG 351 18
Normal Arrangement

 Secondary storage (SS) provides reliable, long-


term storage for large volumes of data
 At any given time, we are usually interested in
only a small portion of the data
 This data is loaded temporarily into main
memory, where it can be rapidly manipulated
and processed.
 As our interests shift, data is transferred
automatically between MM and SS, so the data
we are focused on is always in MM.

CENG 351 19
Why Study File Structure Design?
How Can Secondary Storage Access Time be
Improved?
By improving the File Structure.

Since the details of the representation of the data


and the implementation of the operations
determine the efficiency of the file structure for
particular applications, improving these details
can help improve secondary storage access time.
20

11/19/20
Goal of the file structures

Minimize the number of trips to the disk in order to


get desired information
Grouping related information so that we are likely to
get everything we need with only one trip to the disk.

CENG 351 21
Fixed versus Dynamic Files

It is relatively easy to come up with file structure


designs that meet the general goals when the files
never change.
When files grow or shrink when information is added
and deleted, it is much more difficult.

22

11/19/20
Physical Files and Logical Files
physical file: a collection of bytes stored on a disk or
tape
logical file: a "channel" (like a telephone line) that
connects the program to a physical file
The program (application) sends (or receives) bytes to
(from) a file through the logical file.
The program knows nothing about where the bytes go
(came from).
The operating system is responsible for associating a
logical file in a program to a physical file in disk or
tape.
Writing to or reading from a file in a program is done
through the operating system.
CENG 351 23
Files
The physical file has a name, for instance
myfile.txt
The logical file has a logical name (a varibale) inside
the program.
 In C :
FILE * outfile;
 In C++:
fstream outfile;

CENG 351 24
Definition

A File Structure is a combination of representations


for data in files and of operations for accessing the
data.
A File Structure allows applications to read, write
and modify data. It might also support finding the
data that matches some search criteria or reading
through the data in some particular order.

11/19/20
Basic File Processing Operations

Opening
Closing
Reading
Writing
Seeking

CENG 351 26
File Systems

Data is not scattered hither and thither on disk.


Instead, it is organized into files.
Files are organized into records.
Records are organized into fields.

CENG 351 27
Example
A student file may be a collection of student
records, one record for each student
Each student record may have several fields,
such as
Name
 Address
 Student number
 Gender
 Age
 GPA

Typically, each record in a file has the same


fields.
CENG 351 28
Properties of Files

1) Persistance: Data written into a file persists after


the program stops, so the data can be used later.
2) Sharability: Data stored in files can be shared by
many programs and users simultaneously.
3) Size: Data files can be very large. Typically, they
cannot fit into main memory.

CENG 351 29
Questions

You might also like