0% found this document useful (0 votes)
68 views25 pages

File Handling

Files are needed to store programs, input data, and output data outside of RAM's volatile memory. There are two main types of files: text files, which store data as characters that each take up a byte, and binary files, which store data in the same format as RAM. Text files take up more storage but are human-readable, while binary files are more compact but require knowledge of the data format. To work with files, a program must open, read from or write to, process, and close the file.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views25 pages

File Handling

Files are needed to store programs, input data, and output data outside of RAM's volatile memory. There are two main types of files: text files, which store data as characters that each take up a byte, and binary files, which store data in the same format as RAM. Text files take up more storage but are human-readable, while binary files are more compact but require knowledge of the data format. To work with files, a program must open, read from or write to, process, and close the file.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

FILE HANDLING

Why are Files needed?

A microprocessor/CPU needs program to give instructions on the work that needs to be performed.
Additionally it needs input data to work on and usually it computes and gives some output data. In
short it needs program, input data and output data in order to perform the desired task.

But the CPU only works with programs and data from the RAM which is volatile. So all the programs
as well as the input/output data must be stored on secondary storage devices in the form of files.
These files are loaded into the primary memory/RAM in order to be run on the microprocessor. And
in order to give a logical structure to the file management, a folder or directory structure is
maintained by the Operating System.

Data Files are used by the program/application to either accept input or to store the output or
sometimes both.

Types of Files

All files are stored as sequences of 0s and 1s in a computer. But depending on the way the bits are
stored there are two broad categories of files -

1. Text files
In Text files all the data are stored as characters (usually ASCII characters) and all the
characters usually have the same byte size. For example if the following content/data is to
be stored in a text file -
7552 230.25 ABC

Then it will be stored as-


Byte1 Byte2 Byte3 Byte4 Byte5 Byte6 Byte7 Byte8 Byte9 Byte10 Byte11 Byte12 Byte13 Byte14 Byte15
7 5 5 2 2 3 0 . 2 5 A B C

where each of the boxes will be 1 byte in length and stores the ASCII code for each of the
characters shown. In the above example the content/data occupies 15 bytes.

Advantages of Text files


 Simplicity
 Text files can be read easily as all the characters are identifiable and usually have
the same length (for ASCII text files 1 byte)
 Even if there is an error of a few bits in data, the rest of the data can be safely
recovered

Disadvantages of Text files

 It occupies more storage than is strictly necessary


 Anybody can read the data. It does not offer any scope for adding security features
 If data is to be transferred as text files then care must be taken to see that there is
some separator between the different data, especially between two numeric data
 When numeric data is to be read from a text file, then a translation step to read the
number in (ascii) text format to binary format which is understood by a
computer/program must be done. This step can reduce the speed/performance of
the application.

2. Binary Files
Any file that is not a text file is a binary file. Usually in binary files the data is stored in the
same format in which the data is stored in the computer's memory/RAM.

Page 1 of 25
For example, consider that the three data -
7552 230.25 ABC
i.e. the int value 7552, the float value 230.25, the string 'ABC', are to be stored in binary
format files. Then, the binary representation of the numbers and strings are stored as -

Byte1 Byte2 Byte3


Byte4 Byte5 Byte6 Byte7 Byte8 Byte9
7552 230.25 A B C
000011101 10000000 01000011 01100110 01000000 00000000 01000001 01000010 01000011

The above data occupies 9 bytes. So the binary data is more space efficient compared to text
data, especially when numbers are involved.

Advantages of Binary Files


 They occupy less space compared to text files
 Processing of binary files is faster as minimum processing needs to be done to load it
into memory
 Security measures can be applied to binary data

Disadvantages of Binary Files


 The program/programmer creating the binary data as well as the
program/programmer reading the binary data must be very careful with regards to
the order of the data as well as datatype of the data as there is no scope of missing
even a single bit.
 For reading the binary data we must know the exact order and the datatype of the
data to read it correctly.
 Even a few bits of corruption of the data may render the binary data file useless.

Difference between Text and Binary File

Text File Binary File


Its Bits represent character. Its Bits represent a custom data.
Less prone to get corrupt as change Can easily get corrupted, corrupt on even a single bit
reflects as soon as made and can be change
undone.
Store only plain text in a file. Can store different types of data (audio, text , image) in a
single file.
Widely used file format and can be opened Developed for an application and can be opened in that
in any text editor. application only.
Mostly .txt and .rtf are used as extensions Can have any application defined extension.
to text files.

Basic steps in working with Files

1. Open a File
2. Read the file
3. Process the file
4. Close the file

Opening a File
The open() method is used to open a file.
Syntax:
<file object> = open( filepath, mode='r', encoding=None)
where
file object - It is returned by the open method, which is used to work with the file on the external
storage media.
filepath - It is a string that can represent either the absolute path or the relative path.

Page 2 of 25
mode - represents the mode in which the file is being opened.
encoding- This parameter is used only with text files and it shows the encoding/decoding scheme
that is to be used with text files. If it is not passed for text files then a default value of None is used
and the system defined default encoding scheme is used for text files. Some of the encoding
schemes that can be used are 'ascii', 'utf8', 'utf16' etc. The default encoding scheme for text files in
windows is 'ascii'.

Absolute path- The absolute path starts with the root directory (drive letter C: , D: etc. in windows)
and moves up to the directory and subdirectory till it reaches the subdirectory in which the file is
actually situated. Eg. 'C:\directory1\subdirectory1\subdirectory2\filename.txt'

Relative path- The relative path starts with the current working directory (.) or the parent
directory(..) relative to the current working directory and then moves on to the directory and the
subdirectory till the filename is reached.
Eg. '.\directory1\filename1.txt' , '..\directory1\subdirectory1\subdirectory2\filename1.txt'

Paths and the escape character backslash \


While writing paths in windows the problem comes that the backslash, \ represents the escape
character and if we need to show it inside the string then we need to type two back slashes \\.
Eg. 'C:\\dir1\\subdir1\\file1.txt' , '.\\dir1\\subdir1\\file1.txt'

Raw strings
If we want that the backslash appearing inside the strings should not be pre-processed by python
then we can use raw strings. Raw strings have the letter r or R, in front of the string literal.
Eg. R'c:\dir1\subdir1\file1.txt', r'.\dir1\subdir1\file1.txt'

Examples of valid paths in the open method


f1=open('c:\\dir1\\file1.txt')
f1=open(R'c:\dir1\file1.txt')
f1=open(r'.\dir1\file1.txt')

Examples of invalid paths in the open method


f1=open("c:\dir1\file1.txt")
f1=open("c:\\\dir1\file1.txt")
f1=open(r'c:\\dir1\\file1.txt")

File opening modes-


S. No. Mode Description

1 r / rt Reading text files only. Sets file pointer at beginning of the file. This is the default mode.
Gives error if file does not exist.
2 rb Reading with binary file. Sets file pointer at beginning of the file. Gives error if file does not
exist.
3 r+ / rt+ Both reading and writing text file. The file pointer placed at the beginning of the file. Gives
error if file does not exist.
4 rb+ Both reading and writing with binary file. The file pointer placed at the beginning of the
file. Gives error if file does not exist.
5 w / wt Writing only text file. Truncates file, if file exists. If not, creates a new file for writing.
6 wb Writing with binary file. Truncates file, if file exists. If not, creates a new file for writing.
7 w+ / wt+ Both writing and reading text file. Truncates, if file exists, creates a new file if file does not
exist.
8 wb+ Writing and reading with binary file. Truncates, if file exists, otherwise creates a new file.
9 a / at For appending text file. Move file pointer at end of the file. Creates new file for writing, if it
does not exist.
10 ab Appending with binary file. Move file pointer at end of the file. Creates new file for writing,
if it does not exist.

Page 3 of 25
11 a+ / at+ For both appending and reading text file. Move file pointer at end of file. If the file does
not exist, it creates a new file for reading and writing.
12 ab+ Appending with binary file. Move file pointer at end of file. If the file does not exist, it
creates a new file for reading and writing.
13 x / xt Exclusive text files creation for writing only. If file already exists then it gives error.
14 xb Exclusive binary file creation for writing only. If file already exists then it gives error.

Examples of opening files-


f1=open("c:\\dir1\\subdir1\\file1.txt", mode="r+")
f1=open(r'.\dir1\file1.txt', mode='w+')
f1=open(R'..\dir2\file2.txt', mode='wb+')

Closing a file
The command <file object>.close() is used to close the file associated with the file object. If any open
file is not closed by the programmer explicitly then after the program run is over, then the python
environment automatically closes any open files.

Basic commands for reading a File


1. read()
2. readline()
3. readlines()

1. <file object>.read( [size] )


This command reads size characters in text mode or size bytes in binary mode from the file
associated with the <file object>. If the size parameter is omitted or is negative, then it reads all the
bytes, till the end of file (EOF) is reached. The read() method returns an empty string '' when the end
of file has been reached.

[NOTE:
1. While working with text file be aware of the encoding scheme/language used in the text file,
the number of bytes used per character and the particular method you are using for
reading/writing/displaying a character.
2. Assume that for the discussion here onwards we are working with ASCII encoded text files
only in which each character is stored in one byte only.
]

File Pointer
Whenever a file is opened, an internal integer python variable, the file pointer, is associated with
each file object. The file pointer defines the byte location within the file, where the next read or
write will take place.

When a file is opened for reading, the file pointer is at byte position 0 (in python programming
numbering usually starts from 0)

For example consider the ASCII text file, 'sample.txt', containing the text -
hello world
good day

Program 1: Reading a text file using read command


#1 Reading a text file using read command
myfile=open('sample.txt')
s1=myfile.read(8)
print(s1)
s2=myfile.read(5) #the current read starts reading from where the previous read stopped
print(s2)

s3=myfile.read() #if no parameter specified it reads from current position till end of file

Page 4 of 25
print(s3)
myfile.close()

o/p:
hello wo
rld
g
ood day

1. When the file is opened by the command - myfile=open('sample.txt') - the file pointer is at byte
0.

Byte B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
h e l l o w o r l d \n g o o d d a y EOF

File pointer

2. When the command - s1=myfile.read(8) - is executed, python reads 8 bytes from the current
file pointer position and after the read command the file pointer is at Byte 8.
Byte B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
h e l l o w o r l d \n g o o d d a y EOF

File pointer

3. When the command - s2=myfile.read(5) - is executed, python reads 5 bytes from the current
file pointer position and after the read command the file pointer is at Byte 13. Note that the
newline/enter key is considered as one character.
Byte B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
h e l l o w o r l d \n g o o d d a y EOF

File pointer

4. When the command - s3=myfile.read() - is executed, python reads all the bytes from the
current file pointer position till the end of File (EOF) character.
Byte B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B- B-
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
h e l l o w o r l d \n g o o d d a y EOF

File Pointer

2. <file object>.readline ( [size] )

The readline() method reads a single line and returns it as a string, from the file associated with the
<file object>. The newline character '\n' is left at the end of the string and is only omitted if the last
line in the text file does not end in a newline.

A blank line is represented by '\n'. When the end of file is reached, the readline() method returns an
empty string ''.

Page 5 of 25
If the size is specified, then readline method reads at most size bytes (in which case the \n is not
inserted at the end of the string) or till the '\n' character is reached whichever comes earlier.

Consider the following text file, 'test.txt'-

hello
good morning
Do have a nice day
Good Bye!!!

Program 2: Reading lines from a text file using readline command

myfile=open('test.txt')
s1=myfile.readline() # \n from input is left in string s1
# s1=s1.strip() # use this to remove the \n at the end
print(s1,len(s1)) #print statement adds its own newline at the end
s2=myfile.readline()
print(s2)
s3=myfile.readline(5) #reads only 5 bytes \n not inserted at end
print(s3)
s4=myfile.readline(50) #from the current position till end of line
print(s4)
s5=myfile.readline()
print(s5)
myfile.close()

o/p:
hello
6
good morning

Do ha
ve a nice day

Good Bye!!!

1. The statement - s1=myfile.readline() - reads into the variable s1, the string

h e l l o \n
including the newline character '\n' representing the enter key. This string is of length 6. When we
say - print(s1, len(s1)) - then apart from the \n at the end of the string s1, the print() method by
default adds its own end character \n once again. So while using the output of the readline()
statement if the additional blank line is not needed then either

1. the newline character can be removed from the string s1 as -


s1=s1.strip() OR
2. while using the print statement the end character can be made an empty string as -
print(s1, end='')
The first method is more useful if further processing of string s1 needs to be done.

2. The statement - s3=myfile.readline(5) - reads only 5 characters 'Do ha' and does not add the \n
character at the end.

3. The statement - s4=myfile.readline(50) - will read from the current position of the file pointer till
either the 50 character limit is reached or the \n is encountered in the file. Since the \n is
encountered first, it reads the string 've a nice day\n' , which also includes the \n character.

Page 6 of 25
3. <file object>.readlines ( ) OR list(<file object>)

The readlines() method can read an entire text file and split it up into different lines. The lines of the
text file including the '\n' character is returned back as a list of strings. The same effect can be
observed by using the list() method directly on the <file object> also.

Program 3 : Reading all lines of text file together


myfile=open('sample.txt')
line1=myfile.readlines()
print(line1)
myfile.close()

f=open('test.txt')
L=list(f)
f.close()
print(L)

o/p:
['hello world\n', 'good day']
['hello\n', 'good morning\n', 'Do have a nice day\n', 'Good Bye!!!']

Since the last line of the text file did not have the Enter key pressed, the last element in both the lists
also do not have the \n character at the end.

The list() method also works on the <file object> f, as the file object is an iterable. If we use the for
loop iterator on a valid file object then it gives one line at a time. The returned line will have \n as
the last character.

Program 4: Iterating over a file object


myfile=open('test.txt')
for ln in myfile:
print(ln,end='')

o/p:
hello
good morning
Do have a nice day
Good Bye!!!

Note the use of the end='' parameter to prevent an additional blank line between two lines of text.

The while loop can also be used in combination with the readline() to show the same behaviour.

Program 5: Using while loop to read one line at a time from a text file
myfile=open('sample.txt','r')
s1 = 'abc' # initialize s1 to some non-empty value
while s1:
s1=myfile.readline()
print(s1.strip())

o/p:
hello world
good day

Page 7 of 25
The s1 string returns an empty string '', when it encounters the end of file character. This property is
used in the condition of the while loop to terminate the loop when the file is read completely. Also
note the use of the s1.strip() statement to remove the \n character from the end of the string s1.

Operations on a Text File


File operations are the most intensive and time consuming operations in any application. For
performing an operation on a Text file, there can be different ways. Some ways may be more
efficient in some situations.

Usually if we are performing only a read operation on a small or medium sized text file, then the
read() method without any parameters can read the entire text file in memory and the file can be
closed immediately after executing the read() method, leaving the file free for use by other
applications. But if the file is large size and cannot be accommodated in memory then it is preferable
to use the read(x) or the readline() method so that only a small portion of the file is in memory at
any one point of time.

Also while doing any file operations, care must be taken so that there is less number of operations in
between the file open and the file close statements. So that the file is used by our program or
application for the least amount of time and make our applications efficient.

Basic operations on a Text file


The basic operations on a text file can be categorized into-
1. Character-by-Character processing
2. Word-by-Word processing
3. Line-by-Line processing.

1. Character by Character processing

Program 6: Count how many a's are present in a file

#Method 1 #Method 2
f=open('test.txt') f=open('test.txt')
s=f.read() count,s1 = 0, 'abc'
f.close() while s1:
count=0 s1=f.read(1)
for ch in s: if s1 .lower()=='a':
if ch .lower()=='a': count=count+1
count=count+1 print('a occurs',count,'times')
print('a occurs',count,'times') f.close()

o/p: o/p:
a occurs 3 times a occurs 3 times

In Method 1 we have read the entire text file into a single string s and then processed that string
using the iterator to count how many a's are there in the string (We could also have omitted the
loop and used count=s.count('a'). Here the file is in use/locked by our program for only a short
duration. But the disadvantage is that if the 'test.txt' file is large, then it will not be accommodated in
the computers memory.

In Method 2 we have used the command - s1=f.read(1) - to read only one character at a time from
the file object and till the time our processing of file is over, the file is locked for use by our program.
But the advantage is that this program can work with any large file size.

Page 8 of 25
2. Word-By-Word processing

Program 7: Count how many times the word 'me' appears in a text file

#Method 1 #Method 2 #Method 3


wc=0 wc=0 wc=0
f=open("story.txt", 'rt') f=open("story.txt", 'rt') f=open("story.txt", 'rt')
s=f.read() L=f.readlines() wc = 0
f.close() f.close()
for ln in f:
L=s.split() for i in L: L=ln.split()
x=len(L) L1=i.split() for i in L:
for i in range(x): for j in L1: i=i.lower().strip(' .,\'\"?!')
L[i]=L[i].lower().strip(' .,\'\"?!') j=j.strip(' .,\'\"?!').lower() if i=='me':
wc=L.count('me') if j=='me': wc=wc+1
print('me occurs',wc,'times') wc=wc+1 f.close()
print('me occurs',wc,'times') print('me occurs',wc,'times')
o/p:
me occurs 3 times o/p: o/p:
me occurs 3 times me occurs 3 times

In Method 1, we read the entire text file into the string s. After that using - L=s.split() - the string s is
split into list of words L. Then using the for iterator we go through all the words in list L and using the
strip() function, we strip off all the punctuation symbols around that word. Using the count() method
on the list L, gives us the desired answer.

In Method 2, all the lines of the text file are read into the list L using - L=f.readlines(). Then we
iterate over all the lines and split them into word. Then using the inner nested for loop we go
through the list of words and strip off the punctuation symbols and match it with the string 'me' to
increment our counter wc.

In Method 3, we use the iterator over the file object f. The counter variable ln is a line in the text file.
Inside the iterator, we split the line into words. The words are stripped off the punctuation marks
and then checked if it matches the string 'me'. If yes then the counter variable, wc is incremented by
1.

3. Line-By-Line processing

Program 8: Count the lines starting with 'g'

#Method 1 #Method 2 #Method 3


f=open('test.txt') f=open('test.txt') f=open('test.txt')
lc=0 lc=0 lc=0
L=f.readlines() for i in f: s=f.read()
f.close() if i[0].lower() == 'g': f.close()
for i in L: lc = lc+1 L=s.split('\n')
if i[0].lower() == 'g': print('line starts with g:', lc) for i in L:
lc = lc+1 f.close() if i[0].lower() == 'g':
print('line starts with g:', lc) lc = lc+1
print('line starts with g:', lc)
o/p: o/p:
line starts with g: 2 line starts with g: 2 o/p:
line starts with g: 2

Page 9 of 25
In Method 1, we use the readlines() to read all the lines into a List L. Then using the for iterator we
go through all the lines and check if the starting letter is 'g'. If yes then a counter variable, lc is
incremented.

In Method 2, we iterate directly over the file object f to get one line at a time in each iteration. Then
inside the loop we check if the starting letter is 'g'. If yes then the counter variable is incremented.

In Method 3, we use the read() method to get the entire contents of the text file into a single string,
s. Then we use the statement - L=s.split('\n') - to split the string s into a list of lines. The advantage is
that each of the lines does not have the '\n' character at the end of the line. Then we go through the
list using the for iterator and check if the starting character is 'g'. If yes then the counter variable is
incremented.

Writing Files
For writing a file, the file must be opened in one of the modes that support writing to files.
(a) When opening the file in the read and write mode (r+, rb+), the file must exist prior to the
open statement and the file pointer is placed at Byte 0. If you directly perform a write, any
pre-existing data will be overwritten.
(b) When opening the file in any combination involving write 'w' mode(w,w+,wb,wb+), if the
file already exists, then it will be truncated and any data written will be written to a new
blank file each time you run the program.
(c) When the file is opened in 'a','ab' mode, if the file already exists, then the file pointer is
placed at EOF position, and any new data is appended at the end.
(d) When the file is opened in 'a+', 'ab+' mode, the file pointer is placed initially at byte 0. Any
read operation will increment it from byte 0 position onwards but if any write operation is
performed on that file, then the file pointer position is immediately changed to EOF from
that point onwards.
(e) When the file is opened in 'x', 'xb' mode, then if the file already exists then an error is given.
If the file does not exist then a new file is created and file pointer is at byte 0 position and
only write operations are allowed in 'x' mode.

Commands used for writing


1. <file object>.write(stringobject)
2. <file object>.writelines(L)

<file object>.write(stringobject)
The write() method takes a stringobject and writes it to the file that is associated with the <file
object>

Program 9 : Using the write method

# 8 Writing a text file


# 'w' mode
a,b,c='10','20','30'
f1=open('sample1.txt', 'w')
f1.write(a) #parameters to write must always be strings
f1.write(b)
f1.write(c)
f1.close()

# 'a' mode
f2=open('sample2.txt', 'a') #file opened in append mode
f2.write(a)
f2.write(b)
f2.write(c)
f2.close()

Page 10 of 25
In the above program we have taken the variables a,b,c with the string values '10', '20' and '30'. Even
if our data is in other data types, it should be converted to a string in order for it to be passed to the
write() method.

The file objects f1 and f2 are similar only difference being the file opening modes. Since f1 is opened
in 'w' mode, even if we run the program multiple times, each time the file 'sample1.txt' is first
truncated and then the data '102030' is written into it.

But since f2 is opened in append 'a' mode, if we run the program multiple times, the file
'sample2.txt' is not truncated each time. Instead the data '102030' is appended to the end of the file
each time and if the program is run 2 times then the file will contain the output 2 times, if the
program is run 3 times then the file will contain the output 3 times and so on.

<file object>.writelines(L)
The writelines() method accepts a list of strings L as a parameter and writes it to the file associated
with the <file object>.

Program 10 : Using the writelines method


L1 = ['hello world',
'good morning',
'have a nice day']
f1=open('sample1.txt', 'w')
f1.writelines(L1)
f1.close()

L2 = ['hello world\n',
'good morning\n',
'have a nice day\n']
f2=open('sample2.txt', 'w')
f2.writelines(L2)
f2.close()

print('file written...')

In the above program we have two list of strings L1 and L2. In L1, all the list elements do not have \n
character at their ends, whereas in L2, all the list elements have the \n character at their ends. Since
the writelines() method writes the lines as is one by one, the first file 'sample1.txt' will have all the
sentences following one after the other.
sample1.txt
hello worldgood morninghave a nice day

Whereas in the second file 'sample2.txt' the lines will be displayed one on each line.
sample2.txt
hello world
good morning
have a nice day

Working with Binary Files


Python has the built-in data types - bytes and bytearray for working with binary data. When a binary
file is opened for reading/writing, the same commands - read() and write() are used. . The only
difference is that for-
 <bytes object> = <file object>.read( [x] )
Here x, is the number of bytes that is to be read and it returns back a bytes object
 <file object>.write( x )
Here x, is either a bytes / bytearray object

Page 11 of 25
bytes Objects
Bytes objects are immutable sequences of single bytes.

Creating bytes literals


The syntax for bytes literals is largely the same as that for string literals, except that a b prefix is
added. Only ASCII characters are permitted in bytes literals.
Eg. x=b'abcdefABCD0123' # byte sequence enclosed in single quotes
x=b"23$%^.Abcd" # byte sequence enclosed in double quotes
x=b'''xyzPQ''' byte sequence enclosed in triple quotes

The bytes object is a sequence object where each element i.e. x[0], x[1], x[2] etc. is of exactly one
byte. Internally each element of the byte object behaves as a number in between 0 to 255. If any
non- printable character is to be added to the byte sequence object then the escape sequence \x can
be used. The \x must be followed by exactly two hexadecimal characters (0 to F).
Eg. y =b'hello\x9Aworld\x3b\x2d'
y =b'\x2c\x3a\xac\x4d'

Printing an individual element of a bytes sequence object gives the integer number between 0-255
corresponding to that particular character in ASCII.
Eg. x=b'ABCD'
print(x[0])
o/p:
65

Creating bytes object using bytes.fromhex(string) method


bytes object can be created using the bytes.fromhex() method. It accepts a string as parameter and
the string must contain only pairs of hexadecimal numbers. Any whitespaces are ignored. A pair of
hexadecimal digits are converted into one bytes element.
Eg. x=bytes.fromhex('126B 7C AB')

bytearray objects
bytearray objects are the mutable counterpart to bytes objects. There is no dedicated literal syntax
for bytearray objects, instead they are always created by calling the bytearray() method. With a
bytearray you can do everything as with other mutable iterables/lists like push, pop, insert, append,
delete, and sort.

Different ways of creating a bytearray object


 Creating an empty instance.
E.g. x= bytearray()
print(x)
o/p:
bytearray(b'')
 Creating a zero-filled instance with a given length
E.g. x= bytearray(5)
print(x)
o/p:
bytearray(b'\x00\x00\x00\x00\x00')
 From an iterable of integers between 0-255
E.g. x= bytearray(range(5))
print(x)
o/p:
bytearray(b'\x00\x01\x02\x03\x04')
 Copying existing binary data
E.g. x= bytearray(b'hello world')
print(x)
o/p:
bytearray(b'hello world')

Page 12 of 25
Encoding for writing binary data
Our variables within a python program can be of different data types. But in order to use the write()
method for writing a binary file, these variables must be converted/encoded into streams of byte /
bytearray as the write() method only accepts streams of bytes as input for binary files.

Consider three variable having simple data types which have to be written to a binary file-
 a ='xyz' # string / character / text data
 b=24 # int data
 c=152.35 # float data

1. Encoding a string
When converting a single character to its binary representation, different encoding schemes such as
-ascii, utf-8, utf-16, utf-32 etc are available. If we want to find out the default encoding scheme for
our installation of python then use the following commands-
import sys
print(sys.getdefaultencoding())
o/p:
utf-8

On most systems it will give answer as - utf-8 which is the most widely used encoding scheme in
programming as well as on internet.

In order to convert the string variable - a ='xyz' - to a byte stream, use the encode() method of the
string object. For example -
Program 11 : Encoding string to bytes
# 11 Encoding string to bytes
a='xyz'
b=a.encode()
c=a.encode(encoding='utf-8')
print(b, type(b))
print(c, type(c))

o/p:
b'xyz' <class 'bytes'>
b'xyz' <class 'bytes'>

If no parameter is passed to the encode() method then it uses utf-8 by default.

2. Encoding an int
Whenever a int data such as - b= 24 - is to be encoded, first internally it is converted to binary, then
grouped into bytes and depending on the endianness specified the order of storing bytes, (
byteorder parameter) is changed.

What is Byte Order / Endianness?

Consider the decimal number 708562303 written in hexadecimal format as 2A3BCD7F. Now in
computers memory the hexadecimal form can be represented as either -

(a)
hex data 2A 3B CD 7F
memory address 1001 1002 1003 1004
Here the Least Significant byte (2A) is stored at the smallest memory address. This
format of storing numbers is known as 'little' endian. Intel x86 and AMD64 (x86-64)
processors are little-endian.

Page 13 of 25
(b) H
hex data 7F CD 3B 2A
memory address 1001 1002 1003 1004
Here the Most Significant byte (7F) is stored at the smallest memory address. This format
of storing numbers is known as 'big' endian. Motorola 68000 and PowerPC G5 processors
are big-endian. In sending data over a network big endian format is used i.e. the MSB is
sent then progressively the LSB is sent over a network.

Some processors such as ARM and Intel Itanium feature switchable endianness (bi-endian).

The to_bytes() method of the integer object can be used to encode an integer to a bytes object.

Program 12 : Encoding int to bytes


# 12 Encoding int to bytes
b=24
c=b.to_bytes(2,byteorder='little')
d=b.to_bytes(4,byteorder='big')
print(c, type(c), len(c))
print(d, type(d), len(d))

o/p:
b'\x18\x00' <class 'bytes'> 2
b'\x00\x00\x00\x18' <class 'bytes'> 4

The first parameter to the to_bytes() method is the number of bytes into which the given int variable
is to be encoded into and the second parameter specifies the byteorder in which it is to be saved.

3. Encoding an float
Floating point numbers are stored in a binary form known as IEEE754 format which can be either 4
bytes or 8 bytes in length.

We need the struct library to convert a float to a bytes object and the method struct.pack() is used
for the same.

Program 13 : Encoding float to bytes


# 13 Encoding float to bytes
import struct #struct module needed for converting float to bytes

c=152.35
d=struct.pack('f',c) # 'f' uses 4 bytes to represent a float
e=struct.pack('d',c) # 'd' uses 8 bytes to represent a float
print(d, type(d), len(d))
print(e, type(e), len(e))

o/p:
b'\x9aY\x18C' <class 'bytes'> 4
b'33333\x0bc@' <class 'bytes'> 8

To use the pack() method for floats, the first argument should be either 'f' or 'd'. Using 'f' will store
the floating number in 4 bytes. On using 'd' the float number will be stored in 8 bytes. The second
parameter must be the float variable that is to be converted to a bytes object.

Writing a Binary File


To write a binary file we open the file using any of the binary write modes and for each of the
variable that needs to be written we encode that variable to a bytes or a bytesarray object. If we are
already having a bytes or bytesarray object available, then it can directly be written to a binary file
without performing the encoding operation.

Page 14 of 25
Program 14 : Writing/Encoding a Binary File
# 14 writing binary file
import struct #struct module needed for converting float to bytes
a,b,c='xyz', 24, 152.35

f=open('test1.dat','wb')
f.write(a.encode(encoding='utf-8')) #stores 3 bytes
f.write(b.to_bytes(2,byteorder='little')) # stores 2 bytes
f.write(struct.pack('f',c)) # stores in 4 bytes

d=bytes('PQR', encoding='utf-8') #stores in 3 bytes


e=bytearray(b'\x00\x01\x02\xAB\xEF') #stores 10 bytes
f.write(d)
f.write(e)

f.close()

Reading/Decoding a Binary File


A File is only a sequence of bytes. When reading a binary file, we must know what kind of data (data
type) we are trying to read, how many bytes it is stored in, what is the encoding mechanism used to
encode that particular data. So usually we need to know all the above parameters that went into
creating the binary file in order to read/decode a binary file.

The read() is the primary method used for reading a binary file. If it is not passed any parameter then
it will read the entire binary file as a bytes object. If we pass an int parameter, x to the read(x)
method then it will read only x bytes and return it as a bytes object.

After getting the bytes we have to use the appropriate decoding mechanism to get back the original
data.

Program 15 : Reading/Decoding a Binary File


# 15 reading binary file
import struct

f=open('test1.dat','rb')
a=f.read(3)
b=f.read(2)
c=f.read(4)
d=f.read(3)
e=f.read(10)

print(a, a.decode(encoding='utf-8')) #decoding a string


print(b, int.from_bytes(b, byteorder='little')) #decoding an int
print(c, struct.unpack('f', c)) #unpack returns a tuple object #decoding a float
print(d, d.decode(encoding='utf-8'))
print(e)
f.close()

o/p-
b'xyz' xyz
b'\x18\x00' 24
b'\x9aY\x18C' (152.35000610351562,)
b'PQR' PQR
b'\x00\x01\x02\xab\xef'

Page 15 of 25
When the file is opened in binary mode, the read(x) command will read x bytes as bytes object. In
order to decode the different data types the following methods are used-
1. Decoding a string
The command - a.decode(encoding='utf-8')) - will decode a bytes object to string with the
encoding scheme specified. If no parameter is passed then 'utf-8' is the default scheme used
for decoding.
2. Decoding an int
The command - int.from_bytes(b, byteorder='little')) - is used to decode a bytes object to an
int. The byteorder must be the same as was used during encoding an int.
3. Decoding a float
For decoding a float, again the struct module is needed in order to use the struct.unpack()
method. Again here the first parameter should be either 'f' or 'd' depending on whether 'f' or
'd' was specified during the encoding process. The unpack() method returns back a tuple
object with one float value as the answer.

Converting objects to binary form


If we are having simple data types such as str, int or float then the process of converting it to binary
form is simple. But if we are having complex object such as - lists, tuples, dictionary, nested lists,
dictionary containing lists etc. then the programmer would manually have to go through each of the
element find its datatype and then use the correct form of encoding it. This becomes very
cumbersome. The solution to this is Serialization.

Serialization/Deserialization
In Serialization, we use a common standard or a protocol, in which python object can be given as
input and a standard representation of a sequence of bytes is obtained. This sequence of bytes can
be written on to a binary file or sent over a network.

While reading the file, the reverse process using the same standard/protocol allows us to copy the
binary data directly to objects in python. This process is called deserialization.

Packages used for Serialization


1. pickle module
The pickle module implements protocols for serialization and de-serialization of python
objects. A python object can be converted to a bytes stream object and the process is called
pickling. The reverse process in which a bytes stream is converted back to a python object is
called unpickling.

Limitations
 The pickle module uses a compact binary representation and the bytes stream
object i.e. the pickled data file that is generated is not human-readable.
 Different versions of python use different protocols for pickling/unpickling. So
generating a pickled data in one version of python may not be readable in another
version of python.
 The data formats used by pickle are python-specific. So the data files generated in a
python application can only be read by another application developed in that
version of python. Software developed in other languages will not be able to read
the pickled data files generated by our python application.
 It is possible to construct malicious pickle data which will execute arbitrary code
during unpickling. Never unpickle data that could have come from an untrusted
source, or that could have been tampered with.
2. json module
The json module implements the popular and widely used data interchange format JSON
(Javascript Object Notation). The json module can take python object and convert them to
string representations. The advantage is that the json data files are human-readable. The
data files generated by this module can be used by applications developed in other
languages / tools. Also json data file by itself do not have arbitrary code execution
vulnerability. It is more secure and faster than pickle.

Page 16 of 25
Serializing/Deserializing an object using pickle
The dumps() method of the pickle module is used for serialization and the loads() method is used for
deserialization of a python object.
Program 16 : Pickle an object
# 16 Pickle an object
import pickle
d = { 'empno': 1, 'name': 'abc', 'salary': 5000.5 }
f = open('employee.dat','wb')
pickle.dump(d,f)
f.close()

In order to use the pickle library, first we need to use the statement - import pickle. Then we create
any pickable object (here the dictionary d). After that we open the file 'employee.dat' in binary write
mode (wb). If we need the existing data to be preserved then open the file in 'ab' mode.

The statement - pickle.dump(d,f) - will serialize the dictionary d, and write the bytes into the file f.

Program 17 : Unpickle an object


# 17 Unpickle an object
import pickle
f = open('employee.dat','rb')
d = pickle.load(f)
f.close()
print(d)

o/p:
{'empno': 1, 'name': 'abc', 'salary': 5000.5}
To unpickle an object stored in a binary file, first open the file in read binary mode 'rb'. Then using
the statement - d = pickle.load(f) - the entire data is loaded correctly into the variable d.

To pickle a set of records


In order to pickle a set of similar records such as employee details - (employee number, employee
name), we can create a list of dictionary objects in memory and then pickle the entire list to a binary
file.

Program 18: pickle a list of dictionary records


#18 pickle a list of dictionary records
import pickle
L =[]
while True:
eno = int(input("Enter Employee number:"))
ename = input("Enter Employee Name :")
e = {"eno":eno,"ename":ename}
L.append(e)
ch= input("Add more records(y/n) :")
if(ch=='n'):
break

f = open("employee.dat","ab+")
if f.tell() > 0:
f.seek(0)
L1=pickle.load(f)
L=L1+L
f.truncate(0)
pickle.dump(L,f)
f.close()

Page 17 of 25
1. We start with an empty list L.
2. The code highlighted in yellow is used to accept user input and create a list whose elements are
dictionary objects. Each dictionary object consist of the employee details - employee number(eno)
and employee name(ename).
3. We open the file in 'ab+' mode as we want to preserve existing data, but read as well as write.
When the file is opened in ab+ mode the file pointer is at EOF position. The code that is highlighted
in green is used to copy the existing data and then add the new data to the end of the existing data.
a) The command - if f.tell() > 0: - checks that the file size is greater than 0 or not. If the file
contains existing data, then only the code that follows is executed.
b) The command - f.seek(0) - changes the file pointer to byte 0 i.e. start of the file so that we
may read the existing data.
c) The command - L1=pickle.load(f) - loads the existing data into the list L1.
d) The command - L=L1+L - is used to add the newly added records to the end of existing
records.
e) The command - f.truncate(0) - is used to truncate the file to 0 size as we are going to
populate/dump the combined (old+new) data again into the file
4. The command - pickle.dump(L,f) - is used to populate/pickle the file with the data contained in list
L. For the first run of the program, the list L contains only the newly added record, but for
subsequent runs of the program the list L contains the old/existing data as well as the newly added
data and the entire file is written again.

Viewing the entire set of records from a pickled binary file


The entire binary file can be unpickled by a single read command to give a list of dictionary records
which can be displayed on the screen.

Program 19: unpickling a list of dictionary records


# 19 unpickling a list of dictionary records
import pickle

f = open("employee.dat", "rb")
L = pickle.load(f)
print(L)
f.close()

o/p:
[{'eno': 101, 'ename': 'abc'}, {'eno': 102, 'ename': 'def'}, {'eno': 103, 'ename': 'ghi'}, {'eno': 104,
'ename': 'jkl'}]

Page 18 of 25
Searching for a particular record from a list of dictionary records
We can load the entire binary file into a list object and then iterate over the list object to search for a
matching entry.
Program 20 : Searching a list of dictionary in binary
# 20 Searching a list of dictionary in binary
import pickle

nm = input('Enter employee name to search :')

f = open("employee.dat", "rb")
L = pickle.load(f) #L is a list of dictionary elements
f.close()

found = 0
for x in L: #x is a dictionary element
if x['ename'] == nm:
found = 1
break
if found ==1:
print('Employee found')
print('Details:', x)
else:
print('Employee not found')

o/p:
Enter employee name to search :def
Employee found
Details: {'eno': 102, 'ename': 'def'}

Update one record from a list of dictionary records in binary file


To perform an update, we first load the entire binary file into one list object. We then create another
empty list object and then iterate over the first list. For all the non-matching records we copy them
to the second list and for the matching record we update the record in memory and then copy to the
second list.

Then we truncate the original file to zero size and copy the second list of dictionary records to the
same binary file.
Program 21 : Update one record form a list of dictionary records
# 21 update one record form a list of dictionary records
import pickle

num = int(input('Enter employee number to update the record :'))

f = open("employee.dat", "rb+")
L = pickle.load(f)

L1=[]
found = 0
for x in L:
if x['eno'] == num:
found = 1
x['eno']=int(input('Enter new employee number:'))
x['ename'] = input('Enter new employee name:')
L1.append(x)
if found ==1:
f.truncate(0)
f.seek(0)

Page 19 of 25
pickle.dump(L1,f)
print('Updated the record')
else:
print('Employee not found')
f.flush()
f.close()

o/p:
Enter employee number to update the record :102
Enter new employee number:777
Enter new employee name:nnn
Updated the record

On using the dump() command, the python environment usually writes it to a buffer and later when
the buffer becomes full then it performs the actual write to the file on the hard disk. The flush()
command is used to immediately clear the buffer and request the OS to write the file. If you need to
immediately write it to the hard disk, then the command - os.fsync(f) - will need to be given after the
f.flush() statement, to force the OS to write the file back to hard disk.

Delete one record from a list of dictionary records in binary file


To perform the delete, we first load the entire binary file into one list object. We then create
another empty list object and then iterate over the first list. For all the non-matching records we
copy them to the second list and for the matching record we do not copy it to the second list.

Then we truncate the original file to zero size and copy the second list of dictionary records to the
same binary file.

Program 22 : Delete a record from a list of dictionary records


#22 delete a record from a list of dictionary records
import pickle

num = int(input('Enter employee number to delete :'))

f = open("employee.dat", "rb+")
L = pickle.load(f)

L1=[]
found = 0
for x in L:
if x['eno'] == num:
found=1
else:
L1.append(x)
if found ==1:
f.truncate(0)
f.seek(0)
pickle.dump(L1,f)
print('Deleted the record')
else:
print('Employee not found')
f.flush()
f.close()

o/p:
Enter employee number to delete :777
Deleted the record

Page 20 of 25
Working with File Pointers
We have the following methods for working with file pointers-
1. <file object>.tell()
This method works well with binary files and returns an integer value denoting the byte
position from the beginning of the file.

For text file the tell() function returns an opaque number (Note: from python 3 documentation,
this is not the same as byte position) which can only be used in combination with the seek for a
text file.

2. <file object>.seek(offset, whence)


The offset is an integer value specifying the number of bytes from a particular position
identified by the second parameter whence.

The second parameter whence can have the possible value of either 0, 1 or 2. A whence value of
0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of
the file as the reference point. whence can be omitted and defaults to 0, using the beginning of
the file as the reference point.

For a binary file the seek() method works fine as stated above.

But for a text file, the seek() can only be used with the whence value as 0, i.e. only offsets from
the beginning of the file are valid. Also for a text file going to the end of the file i.e. f.seek(0,2)
only is permitted and no other offsets can be used with whence=2 or whence=1 for text files.

Example-
Consider the following ASCII text file 'novel.txt' with the following data-

ABCDEFGHIJK

When the following program is executed


Program 23 : File Pointers
# 23 File pointers
f=open('novel.txt','r+')
f.write('XYZ')
f.seek(6)
f.write('PQR')
f.seek(0,2)
f.write('UVW')
f.close()

The contents of the file 'novel.txt' is changed to -

XYZDEFPQRJKUVW

How it works?
1. When the file is opened, the file pointer is at byte position 0. So when the 'XYZ' is
written, it overwrites 'ABC'.
2. The command - f.seek(6) - works by placing the file pointer at byte number 6 from the
start of the file. The starting number is 0, so byte offset 6 starts at character 'G' in the
original file and when write 'PQR' is done, it overwrites 'GHI'.
3. The command - f.seek(0,2) - moves the file pointer to the end of the file character and
when the characters 'UVW' are written, they appear at the end.

[Note- In ASCII text files all characters are 1 byte width]

Page 21 of 25
csv module
The csv module implements classes to read and write tabular data in CSV format. The CSV (Comma
Separated Values) is a text file format in which rows of data are present and the individual data
elements are separated by commas.

The csv module’s reader and writer objects read and write sequences.

Functions used for read/write


For performing the read/write operation, the csv file must first be opened in the appropriate text file
mode and a file object associated with that particular csv file obtained. Then the reader() or the
writer() method needs to be called to get a reader/writer object.

1. csv.reader( <file object> [,newline=''] )


The reader() method returns back a reader object that is associated with the <file object>. If there is
a newline (enter key) present in any data then the parameter - newline='' - must be used to process
it properly.

The returned reader object is an iterable and iterating over it gives a list of strings representing each
row of the csv file.

2. csv.writer( <file object> )


The writer() method returns back a writer object that is associated with the <file object>. The writer
object can be used to write rows to the csv file using the writerow() method. The writerow() method
accepts a list of strings as a parameter representing the row that needs to be written to the csv file.

Consider the csv file 'employee.csv' with the following data-


empno,empname,dept
101,abc,planning
102,def,marketing
103,ghi,sales

Program 24 : Using csv module


# 24 Using csv module
import csv

#reading a csv file


f=open('employee.csv', newline='')
cr=csv.reader(f)
for r in cr:
print(r)
f.close()

#writing to a csv file


f=open('employee.csv', 'a', newline='')
cw=csv.writer(f)
cw.writerow(['104', 'jkl', 'production'])
f.close()

After executing the above program the 'employee.csv' file will contain an additional row ['104', 'jkl',
'production'].

Page 22 of 25
Standard Input (stdin)
Standard input is a stream from which a program reads its input data. Usally it is the keyboard. In
python this object can be accessed using the - sys.stdin object. It is used for all interactive inputs,
including calls to the input().

Standard output (stdout)


Standard output is a stream to which a program writes its output data. For a command run from an
interactive shell, that is usually the text terminal which initiated the program. In python this object
can be accessed using the - sys.stdout object. It is used for the output of the print() and for prompts
of the input().

Standard error (stderr)


Standard error is another output stream typically used by programs to output error messages or
diagnostics. The usual destination is the text terminal which started the program. It can be accessed
using the sys.stderr object. It handles the interpreters own prompt and error messages.

The stdin, stdout, stderr are regular text files like those returned by the open() function.

Reading from a stdin


Spyder's terminals are implemented in a different technology which does not provide terminal
access. So to test reading from stdin (keyboard) we have to run the programs in Anaconda prompt.

Program 25 : Reading from standard input (saved as p3.py)


# 25 Reading from standard input
import sys
print('Enter your name:')
nm=sys.stdin.readline()
ag=int(input('Enter your age:'))
print("Hello", nm, 'you are', ag, 'years old')

o/p:

The - sys.stdin.readline() - function reads a line of input from the keyboard. When we type 'abc' and
press the enter key, the string 'abc\n' (including the \n) is passed back to the program. The same
thing is accomplished when we use the input() method in python, the only difference is that the
input() method does not copy the '\n' character pressed at the end into the string.

Redirecting the input


Since the sys.stdin is just a text based stream, this stream can be changed to accept data from any
other text/character stream such as a text file.
Consider a text file 'enroll.txt' containing the name and age as shown in two different lines -
ppp
25

Page 23 of 25
We wish that instead of the user typing the name and age on the terminal, the input data should
come from the above text file 'enroll.txt', then just add the input redirection operator '<' as shown-

Using the sys.stdout and sys.stderr objects


The stdout and stderr object by default display text in the terminal. Any print() statements as well as
the prompts displayed within the input() method are displayed by the sys.stdout object.

Consider the following program run at the anaconda prompt-

Program 26: Writing to stdout stderr (save it as p4.py)


# 26 Writing to stdout stderr
import sys
sys.stdout.write('Hello World\n')
sys.stdout.write('Good Day\n')
sys.stderr.write('You have an error in the program\n')

o/p:

Both the outputs to the stdout as well as stderr outputs are displayed on the terminal by default. If
we want to redirect the stdout to one text file 'out1.txt' then use the '>' operator (shown in red oval
below). If we want to redirect the stderr object use the '2>' redirection operator (as shown in yellow
oval below).

Page 24 of 25
When using the > as well as 2> to redirect any existing files 'out1.txt' and 'out2.txt' are truncated and
the outputs written to an empty file each time the program is run. If we want that the outputs
should be appended to the existing files then use the '>>' and '2>>' operators to redirect the stdout
and stderr objects.

Page 25 of 25

You might also like