File Handling
File Handling
A microprocessor/CPU needs program to give instructions on the work that needs to be performed.
Additionally it needs input data to work on and usually it computes and gives some output data. In
short it needs program, input data and output data in order to perform the desired task.
But the CPU only works with programs and data from the RAM which is volatile. So all the programs
as well as the input/output data must be stored on secondary storage devices in the form of files.
These files are loaded into the primary memory/RAM in order to be run on the microprocessor. And
in order to give a logical structure to the file management, a folder or directory structure is
maintained by the Operating System.
Data Files are used by the program/application to either accept input or to store the output or
sometimes both.
Types of Files
All files are stored as sequences of 0s and 1s in a computer. But depending on the way the bits are
stored there are two broad categories of files -
1. Text files
In Text files all the data are stored as characters (usually ASCII characters) and all the
characters usually have the same byte size. For example if the following content/data is to
be stored in a text file -
7552 230.25 ABC
where each of the boxes will be 1 byte in length and stores the ASCII code for each of the
characters shown. In the above example the content/data occupies 15 bytes.
2. Binary Files
Any file that is not a text file is a binary file. Usually in binary files the data is stored in the
same format in which the data is stored in the computer's memory/RAM.
Page 1 of 25
For example, consider that the three data -
7552 230.25 ABC
i.e. the int value 7552, the float value 230.25, the string 'ABC', are to be stored in binary
format files. Then, the binary representation of the numbers and strings are stored as -
The above data occupies 9 bytes. So the binary data is more space efficient compared to text
data, especially when numbers are involved.
1. Open a File
2. Read the file
3. Process the file
4. Close the file
Opening a File
The open() method is used to open a file.
Syntax:
<file object> = open( filepath, mode='r', encoding=None)
where
file object - It is returned by the open method, which is used to work with the file on the external
storage media.
filepath - It is a string that can represent either the absolute path or the relative path.
Page 2 of 25
mode - represents the mode in which the file is being opened.
encoding- This parameter is used only with text files and it shows the encoding/decoding scheme
that is to be used with text files. If it is not passed for text files then a default value of None is used
and the system defined default encoding scheme is used for text files. Some of the encoding
schemes that can be used are 'ascii', 'utf8', 'utf16' etc. The default encoding scheme for text files in
windows is 'ascii'.
Absolute path- The absolute path starts with the root directory (drive letter C: , D: etc. in windows)
and moves up to the directory and subdirectory till it reaches the subdirectory in which the file is
actually situated. Eg. 'C:\directory1\subdirectory1\subdirectory2\filename.txt'
Relative path- The relative path starts with the current working directory (.) or the parent
directory(..) relative to the current working directory and then moves on to the directory and the
subdirectory till the filename is reached.
Eg. '.\directory1\filename1.txt' , '..\directory1\subdirectory1\subdirectory2\filename1.txt'
Raw strings
If we want that the backslash appearing inside the strings should not be pre-processed by python
then we can use raw strings. Raw strings have the letter r or R, in front of the string literal.
Eg. R'c:\dir1\subdir1\file1.txt', r'.\dir1\subdir1\file1.txt'
1 r / rt Reading text files only. Sets file pointer at beginning of the file. This is the default mode.
Gives error if file does not exist.
2 rb Reading with binary file. Sets file pointer at beginning of the file. Gives error if file does not
exist.
3 r+ / rt+ Both reading and writing text file. The file pointer placed at the beginning of the file. Gives
error if file does not exist.
4 rb+ Both reading and writing with binary file. The file pointer placed at the beginning of the
file. Gives error if file does not exist.
5 w / wt Writing only text file. Truncates file, if file exists. If not, creates a new file for writing.
6 wb Writing with binary file. Truncates file, if file exists. If not, creates a new file for writing.
7 w+ / wt+ Both writing and reading text file. Truncates, if file exists, creates a new file if file does not
exist.
8 wb+ Writing and reading with binary file. Truncates, if file exists, otherwise creates a new file.
9 a / at For appending text file. Move file pointer at end of the file. Creates new file for writing, if it
does not exist.
10 ab Appending with binary file. Move file pointer at end of the file. Creates new file for writing,
if it does not exist.
Page 3 of 25
11 a+ / at+ For both appending and reading text file. Move file pointer at end of file. If the file does
not exist, it creates a new file for reading and writing.
12 ab+ Appending with binary file. Move file pointer at end of file. If the file does not exist, it
creates a new file for reading and writing.
13 x / xt Exclusive text files creation for writing only. If file already exists then it gives error.
14 xb Exclusive binary file creation for writing only. If file already exists then it gives error.
Closing a file
The command <file object>.close() is used to close the file associated with the file object. If any open
file is not closed by the programmer explicitly then after the program run is over, then the python
environment automatically closes any open files.
[NOTE:
1. While working with text file be aware of the encoding scheme/language used in the text file,
the number of bytes used per character and the particular method you are using for
reading/writing/displaying a character.
2. Assume that for the discussion here onwards we are working with ASCII encoded text files
only in which each character is stored in one byte only.
]
File Pointer
Whenever a file is opened, an internal integer python variable, the file pointer, is associated with
each file object. The file pointer defines the byte location within the file, where the next read or
write will take place.
When a file is opened for reading, the file pointer is at byte position 0 (in python programming
numbering usually starts from 0)
For example consider the ASCII text file, 'sample.txt', containing the text -
hello world
good day
s3=myfile.read() #if no parameter specified it reads from current position till end of file
Page 4 of 25
print(s3)
myfile.close()
o/p:
hello wo
rld
g
ood day
1. When the file is opened by the command - myfile=open('sample.txt') - the file pointer is at byte
0.
Byte B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
h e l l o w o r l d \n g o o d d a y EOF
File pointer
2. When the command - s1=myfile.read(8) - is executed, python reads 8 bytes from the current
file pointer position and after the read command the file pointer is at Byte 8.
Byte B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
h e l l o w o r l d \n g o o d d a y EOF
File pointer
3. When the command - s2=myfile.read(5) - is executed, python reads 5 bytes from the current
file pointer position and after the read command the file pointer is at Byte 13. Note that the
newline/enter key is considered as one character.
Byte B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
h e l l o w o r l d \n g o o d d a y EOF
File pointer
4. When the command - s3=myfile.read() - is executed, python reads all the bytes from the
current file pointer position till the end of File (EOF) character.
Byte B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
h e l l o w o r l d \n g o o d d a y EOF
File Pointer
The readline() method reads a single line and returns it as a string, from the file associated with the
<file object>. The newline character '\n' is left at the end of the string and is only omitted if the last
line in the text file does not end in a newline.
A blank line is represented by '\n'. When the end of file is reached, the readline() method returns an
empty string ''.
Page 5 of 25
If the size is specified, then readline method reads at most size bytes (in which case the \n is not
inserted at the end of the string) or till the '\n' character is reached whichever comes earlier.
hello
good morning
Do have a nice day
Good Bye!!!
myfile=open('test.txt')
s1=myfile.readline() # \n from input is left in string s1
# s1=s1.strip() # use this to remove the \n at the end
print(s1,len(s1)) #print statement adds its own newline at the end
s2=myfile.readline()
print(s2)
s3=myfile.readline(5) #reads only 5 bytes \n not inserted at end
print(s3)
s4=myfile.readline(50) #from the current position till end of line
print(s4)
s5=myfile.readline()
print(s5)
myfile.close()
o/p:
hello
6
good morning
Do ha
ve a nice day
Good Bye!!!
1. The statement - s1=myfile.readline() - reads into the variable s1, the string
h e l l o \n
including the newline character '\n' representing the enter key. This string is of length 6. When we
say - print(s1, len(s1)) - then apart from the \n at the end of the string s1, the print() method by
default adds its own end character \n once again. So while using the output of the readline()
statement if the additional blank line is not needed then either
2. The statement - s3=myfile.readline(5) - reads only 5 characters 'Do ha' and does not add the \n
character at the end.
3. The statement - s4=myfile.readline(50) - will read from the current position of the file pointer till
either the 50 character limit is reached or the \n is encountered in the file. Since the \n is
encountered first, it reads the string 've a nice day\n' , which also includes the \n character.
Page 6 of 25
3. <file object>.readlines ( ) OR list(<file object>)
The readlines() method can read an entire text file and split it up into different lines. The lines of the
text file including the '\n' character is returned back as a list of strings. The same effect can be
observed by using the list() method directly on the <file object> also.
f=open('test.txt')
L=list(f)
f.close()
print(L)
o/p:
['hello world\n', 'good day']
['hello\n', 'good morning\n', 'Do have a nice day\n', 'Good Bye!!!']
Since the last line of the text file did not have the Enter key pressed, the last element in both the lists
also do not have the \n character at the end.
The list() method also works on the <file object> f, as the file object is an iterable. If we use the for
loop iterator on a valid file object then it gives one line at a time. The returned line will have \n as
the last character.
o/p:
hello
good morning
Do have a nice day
Good Bye!!!
Note the use of the end='' parameter to prevent an additional blank line between two lines of text.
The while loop can also be used in combination with the readline() to show the same behaviour.
Program 5: Using while loop to read one line at a time from a text file
myfile=open('sample.txt','r')
s1 = 'abc' # initialize s1 to some non-empty value
while s1:
s1=myfile.readline()
print(s1.strip())
o/p:
hello world
good day
Page 7 of 25
The s1 string returns an empty string '', when it encounters the end of file character. This property is
used in the condition of the while loop to terminate the loop when the file is read completely. Also
note the use of the s1.strip() statement to remove the \n character from the end of the string s1.
Usually if we are performing only a read operation on a small or medium sized text file, then the
read() method without any parameters can read the entire text file in memory and the file can be
closed immediately after executing the read() method, leaving the file free for use by other
applications. But if the file is large size and cannot be accommodated in memory then it is preferable
to use the read(x) or the readline() method so that only a small portion of the file is in memory at
any one point of time.
Also while doing any file operations, care must be taken so that there is less number of operations in
between the file open and the file close statements. So that the file is used by our program or
application for the least amount of time and make our applications efficient.
#Method 1 #Method 2
f=open('test.txt') f=open('test.txt')
s=f.read() count,s1 = 0, 'abc'
f.close() while s1:
count=0 s1=f.read(1)
for ch in s: if s1 .lower()=='a':
if ch .lower()=='a': count=count+1
count=count+1 print('a occurs',count,'times')
print('a occurs',count,'times') f.close()
o/p: o/p:
a occurs 3 times a occurs 3 times
In Method 1 we have read the entire text file into a single string s and then processed that string
using the iterator to count how many a's are there in the string (We could also have omitted the
loop and used count=s.count('a'). Here the file is in use/locked by our program for only a short
duration. But the disadvantage is that if the 'test.txt' file is large, then it will not be accommodated in
the computers memory.
In Method 2 we have used the command - s1=f.read(1) - to read only one character at a time from
the file object and till the time our processing of file is over, the file is locked for use by our program.
But the advantage is that this program can work with any large file size.
Page 8 of 25
2. Word-By-Word processing
Program 7: Count how many times the word 'me' appears in a text file
In Method 1, we read the entire text file into the string s. After that using - L=s.split() - the string s is
split into list of words L. Then using the for iterator we go through all the words in list L and using the
strip() function, we strip off all the punctuation symbols around that word. Using the count() method
on the list L, gives us the desired answer.
In Method 2, all the lines of the text file are read into the list L using - L=f.readlines(). Then we
iterate over all the lines and split them into word. Then using the inner nested for loop we go
through the list of words and strip off the punctuation symbols and match it with the string 'me' to
increment our counter wc.
In Method 3, we use the iterator over the file object f. The counter variable ln is a line in the text file.
Inside the iterator, we split the line into words. The words are stripped off the punctuation marks
and then checked if it matches the string 'me'. If yes then the counter variable, wc is incremented by
1.
3. Line-By-Line processing
Page 9 of 25
In Method 1, we use the readlines() to read all the lines into a List L. Then using the for iterator we
go through all the lines and check if the starting letter is 'g'. If yes then a counter variable, lc is
incremented.
In Method 2, we iterate directly over the file object f to get one line at a time in each iteration. Then
inside the loop we check if the starting letter is 'g'. If yes then the counter variable is incremented.
In Method 3, we use the read() method to get the entire contents of the text file into a single string,
s. Then we use the statement - L=s.split('\n') - to split the string s into a list of lines. The advantage is
that each of the lines does not have the '\n' character at the end of the line. Then we go through the
list using the for iterator and check if the starting character is 'g'. If yes then the counter variable is
incremented.
Writing Files
For writing a file, the file must be opened in one of the modes that support writing to files.
(a) When opening the file in the read and write mode (r+, rb+), the file must exist prior to the
open statement and the file pointer is placed at Byte 0. If you directly perform a write, any
pre-existing data will be overwritten.
(b) When opening the file in any combination involving write 'w' mode(w,w+,wb,wb+), if the
file already exists, then it will be truncated and any data written will be written to a new
blank file each time you run the program.
(c) When the file is opened in 'a','ab' mode, if the file already exists, then the file pointer is
placed at EOF position, and any new data is appended at the end.
(d) When the file is opened in 'a+', 'ab+' mode, the file pointer is placed initially at byte 0. Any
read operation will increment it from byte 0 position onwards but if any write operation is
performed on that file, then the file pointer position is immediately changed to EOF from
that point onwards.
(e) When the file is opened in 'x', 'xb' mode, then if the file already exists then an error is given.
If the file does not exist then a new file is created and file pointer is at byte 0 position and
only write operations are allowed in 'x' mode.
<file object>.write(stringobject)
The write() method takes a stringobject and writes it to the file that is associated with the <file
object>
# 'a' mode
f2=open('sample2.txt', 'a') #file opened in append mode
f2.write(a)
f2.write(b)
f2.write(c)
f2.close()
Page 10 of 25
In the above program we have taken the variables a,b,c with the string values '10', '20' and '30'. Even
if our data is in other data types, it should be converted to a string in order for it to be passed to the
write() method.
The file objects f1 and f2 are similar only difference being the file opening modes. Since f1 is opened
in 'w' mode, even if we run the program multiple times, each time the file 'sample1.txt' is first
truncated and then the data '102030' is written into it.
But since f2 is opened in append 'a' mode, if we run the program multiple times, the file
'sample2.txt' is not truncated each time. Instead the data '102030' is appended to the end of the file
each time and if the program is run 2 times then the file will contain the output 2 times, if the
program is run 3 times then the file will contain the output 3 times and so on.
<file object>.writelines(L)
The writelines() method accepts a list of strings L as a parameter and writes it to the file associated
with the <file object>.
L2 = ['hello world\n',
'good morning\n',
'have a nice day\n']
f2=open('sample2.txt', 'w')
f2.writelines(L2)
f2.close()
print('file written...')
In the above program we have two list of strings L1 and L2. In L1, all the list elements do not have \n
character at their ends, whereas in L2, all the list elements have the \n character at their ends. Since
the writelines() method writes the lines as is one by one, the first file 'sample1.txt' will have all the
sentences following one after the other.
sample1.txt
hello worldgood morninghave a nice day
Whereas in the second file 'sample2.txt' the lines will be displayed one on each line.
sample2.txt
hello world
good morning
have a nice day
Page 11 of 25
bytes Objects
Bytes objects are immutable sequences of single bytes.
The bytes object is a sequence object where each element i.e. x[0], x[1], x[2] etc. is of exactly one
byte. Internally each element of the byte object behaves as a number in between 0 to 255. If any
non- printable character is to be added to the byte sequence object then the escape sequence \x can
be used. The \x must be followed by exactly two hexadecimal characters (0 to F).
Eg. y =b'hello\x9Aworld\x3b\x2d'
y =b'\x2c\x3a\xac\x4d'
Printing an individual element of a bytes sequence object gives the integer number between 0-255
corresponding to that particular character in ASCII.
Eg. x=b'ABCD'
print(x[0])
o/p:
65
bytearray objects
bytearray objects are the mutable counterpart to bytes objects. There is no dedicated literal syntax
for bytearray objects, instead they are always created by calling the bytearray() method. With a
bytearray you can do everything as with other mutable iterables/lists like push, pop, insert, append,
delete, and sort.
Page 12 of 25
Encoding for writing binary data
Our variables within a python program can be of different data types. But in order to use the write()
method for writing a binary file, these variables must be converted/encoded into streams of byte /
bytearray as the write() method only accepts streams of bytes as input for binary files.
Consider three variable having simple data types which have to be written to a binary file-
a ='xyz' # string / character / text data
b=24 # int data
c=152.35 # float data
1. Encoding a string
When converting a single character to its binary representation, different encoding schemes such as
-ascii, utf-8, utf-16, utf-32 etc are available. If we want to find out the default encoding scheme for
our installation of python then use the following commands-
import sys
print(sys.getdefaultencoding())
o/p:
utf-8
On most systems it will give answer as - utf-8 which is the most widely used encoding scheme in
programming as well as on internet.
In order to convert the string variable - a ='xyz' - to a byte stream, use the encode() method of the
string object. For example -
Program 11 : Encoding string to bytes
# 11 Encoding string to bytes
a='xyz'
b=a.encode()
c=a.encode(encoding='utf-8')
print(b, type(b))
print(c, type(c))
o/p:
b'xyz' <class 'bytes'>
b'xyz' <class 'bytes'>
2. Encoding an int
Whenever a int data such as - b= 24 - is to be encoded, first internally it is converted to binary, then
grouped into bytes and depending on the endianness specified the order of storing bytes, (
byteorder parameter) is changed.
Consider the decimal number 708562303 written in hexadecimal format as 2A3BCD7F. Now in
computers memory the hexadecimal form can be represented as either -
(a)
hex data 2A 3B CD 7F
memory address 1001 1002 1003 1004
Here the Least Significant byte (2A) is stored at the smallest memory address. This
format of storing numbers is known as 'little' endian. Intel x86 and AMD64 (x86-64)
processors are little-endian.
Page 13 of 25
(b) H
hex data 7F CD 3B 2A
memory address 1001 1002 1003 1004
Here the Most Significant byte (7F) is stored at the smallest memory address. This format
of storing numbers is known as 'big' endian. Motorola 68000 and PowerPC G5 processors
are big-endian. In sending data over a network big endian format is used i.e. the MSB is
sent then progressively the LSB is sent over a network.
Some processors such as ARM and Intel Itanium feature switchable endianness (bi-endian).
The to_bytes() method of the integer object can be used to encode an integer to a bytes object.
o/p:
b'\x18\x00' <class 'bytes'> 2
b'\x00\x00\x00\x18' <class 'bytes'> 4
The first parameter to the to_bytes() method is the number of bytes into which the given int variable
is to be encoded into and the second parameter specifies the byteorder in which it is to be saved.
3. Encoding an float
Floating point numbers are stored in a binary form known as IEEE754 format which can be either 4
bytes or 8 bytes in length.
We need the struct library to convert a float to a bytes object and the method struct.pack() is used
for the same.
c=152.35
d=struct.pack('f',c) # 'f' uses 4 bytes to represent a float
e=struct.pack('d',c) # 'd' uses 8 bytes to represent a float
print(d, type(d), len(d))
print(e, type(e), len(e))
o/p:
b'\x9aY\x18C' <class 'bytes'> 4
b'33333\x0bc@' <class 'bytes'> 8
To use the pack() method for floats, the first argument should be either 'f' or 'd'. Using 'f' will store
the floating number in 4 bytes. On using 'd' the float number will be stored in 8 bytes. The second
parameter must be the float variable that is to be converted to a bytes object.
Page 14 of 25
Program 14 : Writing/Encoding a Binary File
# 14 writing binary file
import struct #struct module needed for converting float to bytes
a,b,c='xyz', 24, 152.35
f=open('test1.dat','wb')
f.write(a.encode(encoding='utf-8')) #stores 3 bytes
f.write(b.to_bytes(2,byteorder='little')) # stores 2 bytes
f.write(struct.pack('f',c)) # stores in 4 bytes
f.close()
The read() is the primary method used for reading a binary file. If it is not passed any parameter then
it will read the entire binary file as a bytes object. If we pass an int parameter, x to the read(x)
method then it will read only x bytes and return it as a bytes object.
After getting the bytes we have to use the appropriate decoding mechanism to get back the original
data.
f=open('test1.dat','rb')
a=f.read(3)
b=f.read(2)
c=f.read(4)
d=f.read(3)
e=f.read(10)
o/p-
b'xyz' xyz
b'\x18\x00' 24
b'\x9aY\x18C' (152.35000610351562,)
b'PQR' PQR
b'\x00\x01\x02\xab\xef'
Page 15 of 25
When the file is opened in binary mode, the read(x) command will read x bytes as bytes object. In
order to decode the different data types the following methods are used-
1. Decoding a string
The command - a.decode(encoding='utf-8')) - will decode a bytes object to string with the
encoding scheme specified. If no parameter is passed then 'utf-8' is the default scheme used
for decoding.
2. Decoding an int
The command - int.from_bytes(b, byteorder='little')) - is used to decode a bytes object to an
int. The byteorder must be the same as was used during encoding an int.
3. Decoding a float
For decoding a float, again the struct module is needed in order to use the struct.unpack()
method. Again here the first parameter should be either 'f' or 'd' depending on whether 'f' or
'd' was specified during the encoding process. The unpack() method returns back a tuple
object with one float value as the answer.
Serialization/Deserialization
In Serialization, we use a common standard or a protocol, in which python object can be given as
input and a standard representation of a sequence of bytes is obtained. This sequence of bytes can
be written on to a binary file or sent over a network.
While reading the file, the reverse process using the same standard/protocol allows us to copy the
binary data directly to objects in python. This process is called deserialization.
Limitations
The pickle module uses a compact binary representation and the bytes stream
object i.e. the pickled data file that is generated is not human-readable.
Different versions of python use different protocols for pickling/unpickling. So
generating a pickled data in one version of python may not be readable in another
version of python.
The data formats used by pickle are python-specific. So the data files generated in a
python application can only be read by another application developed in that
version of python. Software developed in other languages will not be able to read
the pickled data files generated by our python application.
It is possible to construct malicious pickle data which will execute arbitrary code
during unpickling. Never unpickle data that could have come from an untrusted
source, or that could have been tampered with.
2. json module
The json module implements the popular and widely used data interchange format JSON
(Javascript Object Notation). The json module can take python object and convert them to
string representations. The advantage is that the json data files are human-readable. The
data files generated by this module can be used by applications developed in other
languages / tools. Also json data file by itself do not have arbitrary code execution
vulnerability. It is more secure and faster than pickle.
Page 16 of 25
Serializing/Deserializing an object using pickle
The dumps() method of the pickle module is used for serialization and the loads() method is used for
deserialization of a python object.
Program 16 : Pickle an object
# 16 Pickle an object
import pickle
d = { 'empno': 1, 'name': 'abc', 'salary': 5000.5 }
f = open('employee.dat','wb')
pickle.dump(d,f)
f.close()
In order to use the pickle library, first we need to use the statement - import pickle. Then we create
any pickable object (here the dictionary d). After that we open the file 'employee.dat' in binary write
mode (wb). If we need the existing data to be preserved then open the file in 'ab' mode.
The statement - pickle.dump(d,f) - will serialize the dictionary d, and write the bytes into the file f.
o/p:
{'empno': 1, 'name': 'abc', 'salary': 5000.5}
To unpickle an object stored in a binary file, first open the file in read binary mode 'rb'. Then using
the statement - d = pickle.load(f) - the entire data is loaded correctly into the variable d.
f = open("employee.dat","ab+")
if f.tell() > 0:
f.seek(0)
L1=pickle.load(f)
L=L1+L
f.truncate(0)
pickle.dump(L,f)
f.close()
Page 17 of 25
1. We start with an empty list L.
2. The code highlighted in yellow is used to accept user input and create a list whose elements are
dictionary objects. Each dictionary object consist of the employee details - employee number(eno)
and employee name(ename).
3. We open the file in 'ab+' mode as we want to preserve existing data, but read as well as write.
When the file is opened in ab+ mode the file pointer is at EOF position. The code that is highlighted
in green is used to copy the existing data and then add the new data to the end of the existing data.
a) The command - if f.tell() > 0: - checks that the file size is greater than 0 or not. If the file
contains existing data, then only the code that follows is executed.
b) The command - f.seek(0) - changes the file pointer to byte 0 i.e. start of the file so that we
may read the existing data.
c) The command - L1=pickle.load(f) - loads the existing data into the list L1.
d) The command - L=L1+L - is used to add the newly added records to the end of existing
records.
e) The command - f.truncate(0) - is used to truncate the file to 0 size as we are going to
populate/dump the combined (old+new) data again into the file
4. The command - pickle.dump(L,f) - is used to populate/pickle the file with the data contained in list
L. For the first run of the program, the list L contains only the newly added record, but for
subsequent runs of the program the list L contains the old/existing data as well as the newly added
data and the entire file is written again.
f = open("employee.dat", "rb")
L = pickle.load(f)
print(L)
f.close()
o/p:
[{'eno': 101, 'ename': 'abc'}, {'eno': 102, 'ename': 'def'}, {'eno': 103, 'ename': 'ghi'}, {'eno': 104,
'ename': 'jkl'}]
Page 18 of 25
Searching for a particular record from a list of dictionary records
We can load the entire binary file into a list object and then iterate over the list object to search for a
matching entry.
Program 20 : Searching a list of dictionary in binary
# 20 Searching a list of dictionary in binary
import pickle
f = open("employee.dat", "rb")
L = pickle.load(f) #L is a list of dictionary elements
f.close()
found = 0
for x in L: #x is a dictionary element
if x['ename'] == nm:
found = 1
break
if found ==1:
print('Employee found')
print('Details:', x)
else:
print('Employee not found')
o/p:
Enter employee name to search :def
Employee found
Details: {'eno': 102, 'ename': 'def'}
Then we truncate the original file to zero size and copy the second list of dictionary records to the
same binary file.
Program 21 : Update one record form a list of dictionary records
# 21 update one record form a list of dictionary records
import pickle
f = open("employee.dat", "rb+")
L = pickle.load(f)
L1=[]
found = 0
for x in L:
if x['eno'] == num:
found = 1
x['eno']=int(input('Enter new employee number:'))
x['ename'] = input('Enter new employee name:')
L1.append(x)
if found ==1:
f.truncate(0)
f.seek(0)
Page 19 of 25
pickle.dump(L1,f)
print('Updated the record')
else:
print('Employee not found')
f.flush()
f.close()
o/p:
Enter employee number to update the record :102
Enter new employee number:777
Enter new employee name:nnn
Updated the record
On using the dump() command, the python environment usually writes it to a buffer and later when
the buffer becomes full then it performs the actual write to the file on the hard disk. The flush()
command is used to immediately clear the buffer and request the OS to write the file. If you need to
immediately write it to the hard disk, then the command - os.fsync(f) - will need to be given after the
f.flush() statement, to force the OS to write the file back to hard disk.
Then we truncate the original file to zero size and copy the second list of dictionary records to the
same binary file.
f = open("employee.dat", "rb+")
L = pickle.load(f)
L1=[]
found = 0
for x in L:
if x['eno'] == num:
found=1
else:
L1.append(x)
if found ==1:
f.truncate(0)
f.seek(0)
pickle.dump(L1,f)
print('Deleted the record')
else:
print('Employee not found')
f.flush()
f.close()
o/p:
Enter employee number to delete :777
Deleted the record
Page 20 of 25
Working with File Pointers
We have the following methods for working with file pointers-
1. <file object>.tell()
This method works well with binary files and returns an integer value denoting the byte
position from the beginning of the file.
For text file the tell() function returns an opaque number (Note: from python 3 documentation,
this is not the same as byte position) which can only be used in combination with the seek for a
text file.
The second parameter whence can have the possible value of either 0, 1 or 2. A whence value of
0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of
the file as the reference point. whence can be omitted and defaults to 0, using the beginning of
the file as the reference point.
For a binary file the seek() method works fine as stated above.
But for a text file, the seek() can only be used with the whence value as 0, i.e. only offsets from
the beginning of the file are valid. Also for a text file going to the end of the file i.e. f.seek(0,2)
only is permitted and no other offsets can be used with whence=2 or whence=1 for text files.
Example-
Consider the following ASCII text file 'novel.txt' with the following data-
ABCDEFGHIJK
XYZDEFPQRJKUVW
How it works?
1. When the file is opened, the file pointer is at byte position 0. So when the 'XYZ' is
written, it overwrites 'ABC'.
2. The command - f.seek(6) - works by placing the file pointer at byte number 6 from the
start of the file. The starting number is 0, so byte offset 6 starts at character 'G' in the
original file and when write 'PQR' is done, it overwrites 'GHI'.
3. The command - f.seek(0,2) - moves the file pointer to the end of the file character and
when the characters 'UVW' are written, they appear at the end.
Page 21 of 25
csv module
The csv module implements classes to read and write tabular data in CSV format. The CSV (Comma
Separated Values) is a text file format in which rows of data are present and the individual data
elements are separated by commas.
The csv module’s reader and writer objects read and write sequences.
The returned reader object is an iterable and iterating over it gives a list of strings representing each
row of the csv file.
After executing the above program the 'employee.csv' file will contain an additional row ['104', 'jkl',
'production'].
Page 22 of 25
Standard Input (stdin)
Standard input is a stream from which a program reads its input data. Usally it is the keyboard. In
python this object can be accessed using the - sys.stdin object. It is used for all interactive inputs,
including calls to the input().
The stdin, stdout, stderr are regular text files like those returned by the open() function.
o/p:
The - sys.stdin.readline() - function reads a line of input from the keyboard. When we type 'abc' and
press the enter key, the string 'abc\n' (including the \n) is passed back to the program. The same
thing is accomplished when we use the input() method in python, the only difference is that the
input() method does not copy the '\n' character pressed at the end into the string.
Page 23 of 25
We wish that instead of the user typing the name and age on the terminal, the input data should
come from the above text file 'enroll.txt', then just add the input redirection operator '<' as shown-
o/p:
Both the outputs to the stdout as well as stderr outputs are displayed on the terminal by default. If
we want to redirect the stdout to one text file 'out1.txt' then use the '>' operator (shown in red oval
below). If we want to redirect the stderr object use the '2>' redirection operator (as shown in yellow
oval below).
Page 24 of 25
When using the > as well as 2> to redirect any existing files 'out1.txt' and 'out2.txt' are truncated and
the outputs written to an empty file each time the program is run. If we want that the outputs
should be appended to the existing files then use the '>>' and '2>>' operators to redirect the stdout
and stderr objects.
Page 25 of 25