Data File Handling - 1
Data File Handling - 1
Handling
Text file
A text file is a human readable file that comprises of a sequence of characters stored in
the form of ASCII or UNICODE.
Each line in the text file is terminated by an End Of Line (EOL) character that may vary
across operating system.
A text file is recognized by its name having .txt extension.
CSV file
In a comma separated file the data values in each line are separated by commas.
A delimited text file makes use of a delimiter to separate the contents in each line.
Similarly in Tab separated file, the data values in each line are separated by tabs.
Binary File
The data in a binary file is stored in machine-readable data objects. (In a sequence of
binary digits (0s and 1s) without any specific delimiters.)
For example, in a text file the number – 12345 will be stored as a sequence of six bytes, where
as in binary file it may be stored as an integer object requiring 16,32, or 64 bits depending on
the size of the integer object.
Unlike text files, binary files do not require a comma, space, or end of line character.
Binary files can represent a wide range of data types, including numbers, images, audio, video,
and executable code. They store data in its raw, binary form, without any need for
additional characters to separate or delineate different pieces of information.
Opening and Closing Files
Closing Files:
Once the necessary operations on a file have been carried out, it should be closed
using a call to the function close()
Syntax:
Fileobject.close()
READING FROM A FILE
read() : To read entire data from the file. Starts reading from the beginning of the
file to the end of the file in the form of string.
read(n) : To read n characters from the file, starting from the cursor (from the
beginning ). If the file holds fewer than ‘n’ characters, it will read the until the end
of file.
readline() : The readline() function reads a line of the file and returns it in the form
of string. For a specified number n, this function reads at most n bytes. However, it
does not read more than one line, even if ‘n’ exceeds the length of the line.
readlines(): To read all lines from the file into a list and returns a list of strings,
separated by new line character.
WRITING TO FILE
write(string): This method takes a string as parameter and writes it in the text file
in a single line. We will have to add ‘\n’ character to the end of the string. \n is
treated as special characters of 2 bytes. As the argument to the function has to
be string, for storing numeric value, we have to convert it to string.
Syntax: fileobject.write(string)
The write() actually writes the data onto a buffer. When the close() method is
executed, the contents from this buffer are moved to the file located on the
permanent storage.
For storing numeric data value, conversion to string is required.
Program1,2.
WRITING TO FILE (continue)
writelines(): This method is used to write sequence data type such as list,
tuple etc. including multiple strings into a file.
Syntax: fileobject.writelines(sequence)
program
Use of with statement
with statement is used to create a file instead of single open() function. Also we
can use this statement to group file operation statements within block. Using
with ensures that all the resources allocated to the file objects get deallocated
automatically once we stop using the file.
Sytax:
program4
Appending file
Append means – ‘to add to’; if we need to add more data to a file
which already has some data in it, we will be appending data.
Syntax:
<file object>=open(<file name>),’a’)
program5
Absolute and Relative Path
The two most important attributes of a file are the file name and path.
The path identifies a file’s location on the computer.
A file path can be specified in two ways
An absolute path one that always starts at the root folder.
A path that is related to the current working directory of the program.
Representation
Representation 2
Representation 3
The flush() Function
In general, the data written to a file is temporarily stored in a file buffer and
transferred from buffer to file on disk only when the close() function is invoked.
The flush() function can be used to forcefully write the content from python’s buffer to
a file without waiting for the user to close the file.
This makes the content in the buffer readily written to the file on the disk and available
for use.
Syntax: <file_object>.flush()
Program (Eg:)
Random Access Using seek() and
tell()
Accessing and Manipulating Location of File pointer – Random Access
Python provides two functions that help you manipulate the position of file-pointer and thus you
can read/write from the desired location of the file. The functions are:
tell() – returns the current position of the file pointer.
seek() – for changing the position of the file pointer to a desired location.
The seek() function : The seek() function is used to change the position of the file pointer (file handle) by
placing the file pointer at a specific position in the opened file. seek() can be done in two ways.
Absolute Positioning : It will give the actual position of the file pointer where the file pointer has to be
placed.
Syntax: <file-handle>.seek(file_location)
Eg: f.seek(20) – This statement shall move the file pointer to 20th byte in the file pointer no matter where
you are.
Working with Binary Files
Relative Positioning : It has two arguments, offset (new position to set the file pointer)
and from-what(actual position referring to which the file pointer is displaced forward or
backward). It is mentioned with three different options, 0- beginning , 1- current position 2-
end position of the file.
Syntax: <file-handle>.seek(off-set, from-what)
Position to set
the file pointer Reference point
The tell () function:
The tell() function returns the current position of the file pointer in the file.
Note: The beginning (0) is the default reference point. The reference
Syntax: <file_handle>.tell()
points (current and end) are only used in binary files.
program
Standard File Streams
We use file objects to work with data file; similarly input/output from standard I/O
devices is also performed using standard I/O stream object.
In order to work with standard I/O stream, we need to import sys module.
The standard streams available in python are:
Standard input stream.
Standard output stream.
Standard error stream.
The methods which are available for I/O operations in it are:
read()- for reading a byte at a time from keyboard
write()- for writing data on console, i.e. monitor.
Standard File Streams
If you need to write and read non-simple objects like dictionaries, tuples, list or
nested lists on to the files, and if we need to maintain their structure as it is, better
choice is to use binary files.
For this purpose objects are often serialized and then stored in binary files.
The module pickle is used for serializing and de-serializing any python object
structure.
Serialization is the process of transforming data or an object in memory to a stream
of bytes. These stream of bytes in a binary file can then be stored in a disk or data
base.
Serialization process is also called pickling.
While reading the contents of the file, a reverse process i.e., a byte stream is
converted in to an object hierarchy known as de-serialization or unpickling.
Working with Binary Files
Pickle module can be used to store any kind of object in a binary file as it allows us
to store python objects with their structure.
The following steps are to be taken for performing reading and writing operations on
a binary file.
1. we need to import pickle module using import pickle statement.
2. open binary file with required access mode.
3. process binary file by writing/reading objects using pickle module’s methods.
4. once done, close the file.
Working with Binary Files
For reading data from a file, we have to use load() function of pickle module as it would
then unpickle the data coming from the file.
Syntax: <object>=pickle.load(<file handle>)
Working with Binary Files
Sometimes pickle. load() function will raise EOFError (a runtime exception) when
you reach end of file while reading from the file.
You can handle this by following one of the following method:
use try and except blocks or using with statement.
Syntax:<filehandle>=pickle.load(<file_name>,<readmode>)
try:
<object>=pickle.load(<file handle>)
#other statements
except EOFError:
<filehandle>.close()
Working with Binary Files
Syntax:
with open(<filename>,<mode>) as <file handle>:
# use pickle.load here in this with block
# perform their file manipulation task in this with block
You need not mention any exception with the with statement
explicitly.
Working with Binary Files
Searching in a file
Searching in binary file is done in the following way.
1. Open the file in read mode
2. Read the file contents record by record
3. In every read record, look for the desired search key
4. If found, process as desired
5. If not found, read the next record and look for the desired search-key
6. If search-key is not found in any of the records, report that no such value found in the file.
The above first two steps can perform easily, but for the third step you need to know the
exact file location of the record to write the updated data into the file.
Working with Binary Files
The ‘wb+’ mode enables a binary files to be opened in write as well as read mode. It means,
after writing the content in the file, you need not re-open it in read mode for accessing the
records. When writing into the file is over, you must use seek(0) function to bring the file
pointer at the beginning of the file. Further, reading operation can be done similar to the read
mode.
Program
INTRODUCTION TO CSV
For working with CSV files in python, there is an inbuilt module called CSV, it is used to read
and write tabular data in csv format.
There are two basic operations that can be carried out on a CSV file.
Reading a CSV file.
Writing to a CSV file.
Reading from a CSV file.
Reading from csv file is done using the reader object.
We use open() function to open a CSV file and it returns a file object.
This file object (reader) creates a special type of object to access CSV file using reader()
function.
The reader object is an iterable format that gives us access to each line of csv file as a list of
fields.
You can also use next() directly on it to read the next line of csv file.
CSV FILE HANDLING IN PYTHON