0% found this document useful (0 votes)
9 views14 pages

ADS2 Chap4 Files 25

The document provides an overview of file handling in computer science, emphasizing the importance of files for persistent data storage beyond program execution. It defines files, categorizes them into text and binary types, and discusses file access methods, particularly sequential access. Additionally, it covers file declaration, manipulation primitives, and the processes for opening, reading, and writing files, highlighting the structural limitations and practical implications of using sequential files.

Uploaded by

bacstudy33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

ADS2 Chap4 Files 25

The document provides an overview of file handling in computer science, emphasizing the importance of files for persistent data storage beyond program execution. It defines files, categorizes them into text and binary types, and discusses file access methods, particularly sequential access. Additionally, it covers file declaration, manipulation primitives, and the processes for opening, reading, and writing files, highlighting the structural limitations and practical implications of using sequential files.

Uploaded by

bacstudy33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ALGIERS UNIVERSITY 1 - DEPARTMENT OF COMPUTER SCIENCE

Algorithmic and Data


Structure 2- ADS2
L1 Computer Science

Dr N. BOUMELA
2023/2024

1
L1 Computer Science – ADS2 Chapter 4 : FILES

2
Chapter 4 - FILES
1. Introduction
So far, our programs have relied on variables and arrays to store and process data. However, these
storage methods use volatile memory (RAM), which is temporary. As a result, any data stored in variables
or arrays is lost when the program ends or the computer is turned off145.
In many real-world scenarios, we need to preserve data even after a program finishes running. For
example, storing student records—including names, surnames, birth dates, and grades—in arrays is not
practical if we must re-enter this information every time the program starts. Volatile storage also makes it
impossible to query or update data across multiple runs, as all information is wiped when the program exits
This is where files become essential. Files allow us to store data permanently on external storage devices
such as hard drives, SSDs, or even CDs. By using files, we can:
 Retain data between program executions.
 Avoid repetitive manual data entry.
 Manage and process large volumes of data efficiently2457.

A file is simply a collection of data stored on secondary storage. Files can contain various types of
information, including text, numbers, images, and more. In programming, files are used not only for data
storage but also for saving programs themselves and other digital content.
This is where FILES come into play. By storing data on external media like hard drives or CDs, we ensure
its longevity.
By mastering file operations, we gain the ability to develop programs that read from and write to files,
allowing for persistent data storage and more sophisticated data handling.
While variables and arrays hold data temporarily in RAM, files ensure long-term storage on secondary
devices.
This makes file handling essential for applications that require data persistence or need to manage large
volumes of data.
A solid understanding of file input/output (I/O) is therefore fundamental to building reliable, real-world
software systems.

1
L1 Computer Science – ADS2 Chapter 4 : FILES
2. Definition of FILE
In algorithmic and computer science terms, a file is a structured collection of data stored on a
persistent medium, such as a hard disk, solid-state drive, or optical disk. Unlike variables or arrays,
which reside in volatile memory (RAM) and are erased when a program ends, a file ensures that
data is preserved even after program termination or system shutdown.

A file is characterized by several attributes, including:

 Name: A unique identifier within the file system that distinguishes the file from others. The filename
may include an extension indicating the file format (e.g., .txt, .bin) and is subject to length and character
restrictions depending on the operating system and file system.
 Size: The amount of data contained within the file, typically measured in bytes. The file size reflects
only the content and not the metadata such as the filename or creation date.
 Content: The actual data stored, which may be in the form of text, binary, or structured information,
depending on the file's intended use.
Files serve as the fundamental units for storage and retrieval in computer systems, allowing data to
be read, written, modified, and managed through various file operations. They provide a
mechanism for persistent data storage, enabling programs to exchange, share, and maintain
information across different executions and computing environments.

In summary, a file is a contiguous sequence of bytes, organized according to a specific format, and
identified by a unique name within a file system. Files are essential for data persistence,
management, and portability in algorithmic programming and software development.

2
L1 Computer Science – ADS2 Chapter 4 : FILES

3. File Types
In algorithmic programming, files used for data storage are generally categorized into two main types:
 Text Files (Untyped Files)
 Binary Files (Typed Files)
3.1. Text Files (Untyped)
A text file stores data as a sequence of characters, typically encoded in ASCII or Unicode. Each character
occupies one byte (8 bits), and the data is organized line by line. Every line ends with a special end-of-line
(EOL) character, such as \n (newline). Even numerical data is stored as its character representation (for
example, the digit 5 is stored as the ASCII code 53, not as the binary value 5).

 Contains only readable text data.


 Each element is a character (1 byte).
 Data is stored and displayed line by line.
 Can be created and edited with standard text editors (e.g., Notepad, VS Code).
 Common extensions: .txt, .csv, .py, etc.
For example, the word ‘Write’is stored in a text file as the following ASCII codes:

Text W r i t e
ASCII 87 114 105 116 101
So, the file contains the bytes: 87, 114, 105, 116, 101.

3.2. Binary Files (Typed files)


A binary file stores data as a sequence of bytes, which may represent any type of information—numbers,
images, audio, video, or program code. The data is not limited to human-readable characters and is often
structured according to the needs of the application. Binary files are more efficient for storing complex or
large data because they avoid the overhead of character encoding.

 Contains data in raw byte format (not limited to text).


 Can store any type of data: numbers, images, audio, video, executables, etc.
 Not readable or editable with a standard text editor; requires specialized software.
 Common extensions: .bin, .exe, .jpg, .mp3, etc.

3
L1 Computer Science – ADS2 Chapter 4 : FILES
Let the previous example x = 940.568349124E-47. The size of the element is 32 bits (4 Bytes).

Summary :

 Text files are for storing and exchanging human-readable data.


 Binary files are for efficient storage and manipulation of complex or large data, often not human-
readable.
Choosing between text and binary files depends on the type of data and the requirements for processing
and storage efficiency

4. Organisation and Access


When working with files, data is stored on external storage devices rather than in main memory. This
means we cannot access file data as instantly or directly as we do with variables or arrays in RAM. The
method used to access data within a file depends on how the data is organized in that file. There are three
primary file access methods:
 Sequential access
 Direct access
 Indexed sequential access
In this course, we will focus on sequential access.

4.1. Sequential access files


Sequential access is the simplest and most common method for accessing data in a file. In this approach,
data is processed in the exact order it is stored: one record after another, from the beginning of the file to
the end. Each read or write operation moves a file pointer forward to the next record, making it suitable for
applications that process data linearly, such as log files, batch processing, or text editors

Key characteristics of sequential access:


 Data is accessed in a predetermined, linear sequence.
 Each operation (read/write) processes the next available record.
 To access a specific record, all preceding records must be read first8.
 Simple to implement and efficient for reading or writing large files in order
Advantages:
 Easy to program and manage.
 Efficient for processing all records in order.
 Less prone to data corruption, as data is written and read sequentially16.
Disadvantages:
 Slow when accessing specific records far from the beginning, since all previous records must be
read first.
 Less flexible for applications requiring frequent updates or direct access to specific records.

4
L1 Computer Science – ADS2 Chapter 4 : FILES
4.2. Other Access Methods
Direct Access: Allows access to any record directly, without reading previous records. Suitable for
databases or applications needing quick retrieval of specific data.

Indexed Sequential Access: Combines sequential and direct access by using an index to quickly locate
records, ideal for large files needing both fast access and ordered processing.

In summary, while files provide persistent data storage, the way we organize and access their data—
especially through sequential access—directly impacts how efficiently we can process and manage
information in our programs.

For example, all files stored on magnetic tapes or, formerly, cassettes are sequential files.

To work with a sequential file, it must first be opened using an OPEN statement, which loads the file from
secondary storage (e.g., a hard drive) into memory for processing. Sequential files operate under strict
access rules and structural constraints:

1. Opening Modes and Restrictions

A sequential file can be opened in one of three modes[2][3]:

 Input (Read-only): Allows reading data from the file.


 Output (Write-only): Overwrites existing data or creates a new file.
 Append (Write-only): Adds data to the end of an existing file.
Key constraints:

 A sequential file cannot be opened for both reading and writing simultaneously[2][4]. For example, a
file opened in Input mode must be closed before reopening in Output or Append mode.
 Switching modes requires closing and reopening the file.

2. Sequential Access Mechanism

Sequential files are analogous to magnetic tapes: data is accessed linearly, one element at a time, via a
Read-Write Head (RWH). The RWH points to the current element being processed, and operations proceed
as follows:

 Reading: The RWH moves forward after each read, accessing the next element in sequence.

 Writing: Data is appended to the end of the file, advancing the RWH to the new EOF position.
5
L1 Computer Science – ADS2 Chapter 4 : FILES
Operation Behavior

Read Retrieves the current element, then moves RWH to the next element.

Write (Output) Overwrites the file from the start, erasing existing data[2][4].

Write Adds new data after the EOF marker, preserving existing content[2][7].
(Append)

3. Structural Limitations

 End Of File (EOF): A marker indicating the end of valid data. Attempting to read past EOF triggers an
error[5][8].

 No mid-file modifications:

o Inserting or modifying data within the file is not permitted[5][6].

o To update a record, the entire file must be rewritten[7][6].

 Append-only writes: New data can only be added at the end of the file[2][7].

4. Practical Implications

 Efficiency: Sequential access is optimal for batch processing or log files, where data is processed
linearly[7][9].

 Drawbacks:

o Slow for random access or frequent updates[7][6].

o Requires reprocessing the entire file for minor changes[6].

Example Workflow

// Open a file in append mode


file = open("data.txt", "a") // RWH starts at EOF
file.write("New data\n") // Appends to the end
file.close()

// Reopen in read mode


file = open("data.txt", "r")
print(file.readline()) // Reads the first line
file.close()

1. This structure ensures data integrity but limits flexibility, making sequential files best suited for
large-scale, order-dependent tasks like backups or bulk data processing[7][9].

6
L1 Computer Science – ADS2 Chapter 4 : FILES
5. File Declaration
Declaring a file involves specifying its Name and the Type of its elements using a specific keyword: File

Syntax : <NomLog>: File of <TypeElt>;


 <NomLog>: This is an identifier for the Logical Name of the file.
 <TypeElt>: This is the type of elements in the file, which can be simple or structured.
Example:

Fent: File of integer; F1: File of real; Fchar: File of character;


Fstud: File of TStudent; where TStudent is a Record type.

6. File Manipulation Primitives


Whenever we define a new type, it's essential to specify the operations that can be performed with this
type. With the file type, we can:
1- Assign a file.
2- Open a file.
3- Read from or Write to a file.
4- Close a file.

6.1. Assignment
This instruction establishes a link between the logical and physical aspects of the file. It allows specifying
the physical name of the file where the data will be stored, read, or processed. Its execution simply
provides this information to the operating system (OS) to use it when it starts processing."

6.1.1. Syntax

Assign(<LogicalFile_Name>,<PhysicalFile_Name>);
<LogicalFile_Name> : This is the identifier of the file declared in the declaration section.

< PhysicalFile_Name > : It is a string representing the physical name of the file. It can optionally contain the
full path on the storage unit.

 Exemples
 Assign(Fint, 'IntegerFile');
 Assign(Fchar, 'Character.dat');
 Assign(FEtud, 'C:\Curriculum\files\StudentInfo');
 Assign(F1,’Number.Dat’);
 Path  ‘D:\resultts\Marks.dat’;
Assigner(Fres,Path);

7
L1 Computer Science – ADS2 Chapter 4 : FILES
Every file has a physical name (filename) that uniquely identifies it. To recognize the type of the file, an
extension is added to the name, followed by a dot (.txt for text file, .exe for executable file, .doc for Word
file, etc.).

6.2. Opening a File


To be able to utilize a file, it needs to be opened. A file is opened either for reading or writing data, hence
we have two opening modes.

6.2.1. Opening in Read Mode


Its syntax is: Read(<LogicalFile_Name>); or Open((<LogicalFIle_Name>, ‘R’) ;

Example: Read(Fchar);

The execution of this action involves the operating system (OS) and triggers a sequence of operations:

- Using the action Assign(<LogicalFile _Name>, < PhysicalFile_Name >), the OS initiates a search on
the external memory at a location named < PhysicalFile_Name >.

Physical File
Two possible outcomes arise:

The file does not exist : RWH

The system raises an exception and returns an error message.

 The file exists :

The system opens the file and positions the read-write head (RWH) on the
first element.

6.2.2. Opening in Write Mode


Syntax: Rewrite(<LogicalFile_Name>); or Open((<LogicalFile_Name>, ‘W’) ;

Example: Rewrite(Fchar);

This syntax instructs the program to open the file associated with the logical name < LogicalFile_Name >
in writing mode. If the file already exists, it erases all existing data and positions the write head (RWH) at
the beginning of the file for writing new data. If the file does not exist, a new empty file is created.

Similarly, the execution of this action involves the OS, and through the assignment, triggers a search for
the file <PhysicalFile_Name>:

8
L1 Computer Science – ADS2 Chapter 4 : FILES
Again, two outcomes are possible:
Physical File

The file does not exist :


RWH EOF
The system creates an empty file (containing only the EOF marker) with the
PhysicalFileName. The RWH will be positioned on the EOF marker.

 The file exists :

If the file already exists, the system opens the file, ERASES all existing data (making the file empty), and
positions the read-write head (RWH) at the beginning.

6.3. Reading and Writing in Files

6.3.1. Reading from a file

6.3.1.1. Syntax
Read(<LogicalNameFile>,<idVar>);

Example
Read (FInt,x); // x : Integer ;

This operation is executed on a file opened in read mode. It allows reading the current element (pointed
to by the TLE) and places it into the variable <idVar>, which must be of the same type as the elements of
the file. Then, it moves the TLE to the next element.

6.3.2. Writing in a file

6.3.2.1. Syntax
Write(<LogicalNameFile >, <idVar>);

Example:
Write(FChar, C); // C : Char ;

This operation is executed on a file opened in write mode. It allows writing the content of a variable
<idVar>, which must be of the same type as the elements of the file, into the file named <NomLog>. Writing
always occurs at the end of the file. Therefore, we move the EOF marker one step, creating an empty space
to accommodate the added element.

9
L1 Computer Science – ADS2 Chapter 4 : FILES
6.3.3. Closing a file
It's simple, once we finish processing a file, we need to close it. Its syntax is:

Close(<NomLog>);

Example

Close(Fcar);

This operation is executed on a file opened in read or write mode. It allows closing the file <
LogicalName>, which corresponds physically to < PhysicalName >. Once closed, no further operations on
the file will be possible.

Note: Closing a file does not affect the assignment; the link between < LogicalName > and
<PhysicalName> still exists, and we can reopen the file without reassigning it.

6.4. Modification in files


To modify a data in a sequential access file F1, we need to utilize a second file F2.

1. Open F1 in "read mode".


2. Open F2 in "write mode".
3. Provide the information to be modified and the replacement information.
4. For each read operation from F1, check if the data is equal to the one to be modified.
5. If not, store the read data from F1 in F2.
6. If yes, store the replacement data in F2.
7. Repeat this process until the end of file F1 is reached.
8. Finally, close both files, overwrite file F1 with file F2 (copy F2 into F1), and then delete file F2 from
the disk.
Algorithm ModifInFile ;
Var
F1 , F2 : File of integer ;
x , y , z : integer ;
BEGIN
Assign (F1 , ‘File1’);
Read ( F1) ;
Assign (F2 , ‘File2’);
Rewrite(F2,)
Write(‘Donner la donnée à remplacer ‘);
Read ( x ) ;
Write ( ‘ Donner la donnée de modification ‘ ) ;
Read ( y ) ;
While Not (EOF(F1)) Do
Read ( F1 , z ) ;
If x <> z Then
Write ( F2 , z )
Else
Write ( F2 , y ) ;

10
L1 Computer Science – ADS2 Chapter 4 : FILES
EndIF ;
Close ( F1 ) ;
Close ( F2 ) ;
F1  F2 ; // Replace F1 by F2
END.

7. End of File Marker (EOF)


When traversing a file opened for reading, we need to know whether we've reached the end or not.
Therefore, in an algorithm, we require an action that provides us with this information. This is precisely
what the function EOF() does.

Syntax EOF(<NomLog>);

Example EOF(Fcar);

EOF() returns TRUE if the TLE is on an EOF marker and FALSE otherwise.
 If EOF() returns False immediately after opening, then the file is EMPTY.
 The EOF() function is only used with files opened for READING.

8. Illustartive Exercise
Let File1 and File2 be two files of strings. Each string represents a word. Write an algorithm that
constructs a file File3 such that File3 contains the words from File1 that do not exist in File2.

8.1. Solution
So, we have two files whose elements are strings. We want to construct (create) a third file that will
contain the words (elements) from File1 that do not exist in File2. This problem is well-known; we have
already encountered it with arrays. It involves traversing the entire File1 (up to its EOF). For each element
read (Read), we perform a search for this element in File2 (Read an element and then compare). If we find
it, we stop the search and move to the next element of File1. If we reach the EOF of File2 without finding it,
we write it (Write) to the File3 file.

Algorithm ExampleFiles ;
Var
F1, F2, F3: File of string[30];
X, Y: string[30];
Found: boolean;
BEGIN
Assign(F1, 'File1');
Assign(F2, 'File2');
Assign(F3, 'File3');
Read(F1); Rewrite(F3); // Open F2 for reading and F3 for writing
While Not EOF(F1) Do
Read(F1, X); // Read a word from F1
11
L1 Computer Science – ADS2 Chapter 4 : FILES
Found  False; // Assume the word does not exist in F2
Read(F2); // Open F2 for reading and return to the beginning of the file F2 at each iteration
While Not EOF(F2) And Not Found Do
Read(F2, Y); // Read a word from F2
If Y = X Then
Found  True; // Stop the search if the word is found
EndIf ;
EndWhile;

If Not Found Then


Write(F3, X); // If not found, write it to F3
EndIf;
Close(F2);
EndWhile ;
Close(F1); Close(F3);
END.

12

You might also like