FILE HANDLING
DATA FILE HANDLING
Introduction
▪ FILE HANDLING is a mechanism by which we can read data of disk files in python
program or write back data from python program to disk files.
▪ So far in our python program the standard input in coming from keyboard an output is
going to monitor i.e. nowhere data is stored permanent and entered data is present as
long as program is running BUT file handling allows us to store data entered through
python program permanently in disk file and later on we can read back the data
DATA FILES
It contains data pertaining to a specific application, for later use. The data files can be
stored in two ways –
▪ Text File
▪ Binary File
Text File
▪ Text file stores information in ASCII OR UNICODE character. In text file
everything will be stored as a character for example if data is “computer” then it will
take 8 bytes and if the data is floating value like 11237.9876 it will take 10 bytes.
▪ In text file each like is terminated by special character called EOL(End of Line). In
text file some translation takes place when this EOL character is read or written. In
python EOL is ‘\n’ or ‘\r’ or combination of both
Steps in Data File Handling
1. OPENING FILE
We should first open the file for read or write by specifying the name of file
and mode.
2. PERFORMING READ/WRITE
Once the file is opened now, we can either read or write for which file is
opened using various functions available
3. CLOSING FILE
After performing operation, we must close the file and release the file for other
application to use it
Opening File
▪ File can be opened for either – read, write, append.
SYNTAX:
file_object = open(filename)
Or
file_object = open(filename, mode)
** default mode is “read”
myfile = open(“story.txt”)
here disk file “story.txt” is loaded in memory and its reference is linked to “myfile” object,
now python program will access “story.txt” through “myfile” object.
here “story.txt” is present in the same folder where .py file is stored otherwise if disk file to
work is in another folder we have to give full path.
myfile = open(“article.txt”,”r”)
1
FILE HANDLING
here “r” is for read (although it is by default, other options are “w” for write, “a” for append)
myfile = open(“d:\\mydata\\poem.txt”,”r”)
here we are accessing “poem.txt” file stored in separate location i.e. d:\mydata folder.
At the time of giving path of file we must use double backslash(\\) in place of single
backslash because in python single slash is used for escape character and it may cause
problem like if the folder name is “nitin” and we provide path as d:\nitin\poem.txt then in
\nitin “\n” will become escape character for new line, SO ALWAYS USE DOUBLE
BACKSLASH IN PATH
Another solution of double backslash is using “r” before the path making the string as raw
string i.e. no special meaning attached to any character as:
myfile = open(r“d:\mydata\poem.txt”,”r”)
File Handle
myfile = open(r“d:\mydata\poem.txt”,”r”)
In the above example “myfile” is the file object or file handle or file pointer holding the
reference of disk file. In python we will access and manipulate the disk file through this file
handle only.
File Access Mode
Text File Binary File Description Notes
Mode Mode
‘r’ ‘rb’ Read only File must exists, otherwise Python raises I/O errors
‘w’ ‘wb’ Write only If file not exists, file is created
If file exists, python will truncate existing data and
overwrite the file.
‘a’ ‘ab’ Append File is in write mode only, new data will be added
to the end of existing data i.e. no overwriting. If
file not exists it is created
‘r+’ ‘r+b’ or ‘rb+’ Read and File must exists otherwise error is raised Both
write reading and writing can take place
+ ‘w+b’ or Write and File is created if not exists, if exists data will be
‘wb+’ read truncated, both read and write allowed
‘a+’ ‘a+b’ or ‘ab+’ Write and Same as above but previous content will be
read retained and both read and write.
Closing file
▪ As reference of disk file is stored in file handle so to close we must call the close()
function through the file handle and release the file.
myfile.close()
Note: open function is built-in function used standalone while close() must be called through
file handle
Reading from File
2
FILE HANDLING
▪ To read from file python provide many functions like :
▪ Filehandle.read([n]) : reads and return n bytes, if n is not specified it reads entire
file.
▪ Filehandle.readline([n]) : reads a line of input. If n is specified reads at most n
bytes. Read bytes in the form of string ending with line character or blank string if
no more bytes are left for reading.
▪ Filehandle.readlines(): reads all lines and returns them in a list
Writing onto files
▪ After read operation, let us take an example of how to write data in disk files. Python
provides functions:
write ()
writelines()
▪ The above functions are called by the file handle to write desired content.
Name Syntax Description
write() Filehandle.write(str1) Writes string str1 to file referenced by filehandle
Writelines() Filehandle.writelines(L) Writes all string in List L as lines to file
referenced by filehandle.
Now we can observe that while writing data to file using “w” mode the previous content of
existing file will be overwritten and new content will be saved.
If we want to add new data without overwriting the previous content then we should write
using “a” mode i.e. append mode.
▪ When we write any data to file, python hold everything in buffer (temporary memory)
and pushes it onto actual file later. If you want to force Python to write the content of
buffer onto storage, you can use flush() function.
▪ Python automatically flushes the files when closing them i.e. it will be implicitly
called by the close(), BUT if you want to flush before closing any file you can use
flush()
“with” statement
▪ Python’s “with” statement for file handling is very handy when you have two related
operations which you would like to execute as a pair, with a block of code in
between:
with open(filename[, mode]) as filehandle:
The advantage of “with” is it will automatically close the file after nested block of code. It
guarantees to close the file how nested block exits even if any run time error occurs.
STRING FUNCTIONS SUMMARY
Method Description
capitalize() Converts the first character to upper case
count() Returns the number of times a specified value occurs in a string
endswith() Returns true if the string ends with the specified value
index() Searches the string for a specified value and returns the position of where it
was found
isalnum() Returns True if all characters in the string are alphanumeric
isalpha() Returns True if all characters in the string are in the alphabet
isdigit() Returns True if all characters in the string are digits
3
FILE HANDLING
islower() Returns True if all characters in the string are lower case
isnumeric() Returns True if all characters in the string are numeric
isupper() Returns True if all characters in the string are upper case
lower() Converts a string into lower case
replace() Returns a string where a specified value is replaced with a specified value
split() Splits the string at the specified separator, and returns a list
upper() Converts a string into upper case
Note: All string methods returns new values. They do not change the original string.
CSV FILE HANDLING
• CSV is a simple file format used to store tabular data, such as a spreadsheet or
database.
• Files in the CSV format can be imported to and exported from programs that store
data in tables, such as Microsoft Excel or OpenOffice Calc.
• CSV stands for "comma-separated values“.
• A comma-separated values file is a delimited text file that uses a comma to separate
values.
• Each line of the file is a data record. Each record consists of one or more
fields, separated by commas. The use of the comma as a field separator is
the source of the name for this file format
• To perform read and write operation with CSV file, we must import csv module.
• open() function is used to open file, and return file object.
Reading from CSV file
• import csv module
• Use open() to open csv file, it will return file object.
• Pass this file object to reader object.
• Perform operation you want
How to create CSV file
Method 1 (From MS-Excel):
• Open Excel, delete all the sheet except sheet 1
• Type all the data, in separate cells
• Save it as csv file in your desired location.
• If any warning comes, click on „YES‟
• When you close the excel, choose „NO‟
• Now file is created at your desired location, go and double click or open with notepad
to check the content
Method 2 (From Notepad):
• Open Notepad
• Type record by separating each column value by comma(,)
• Every record in separate line
• Save it by giving extension .csv (PUT THE NAME IN DOUBLE QUOTES TO
ENSURE .TXT WILL NOT BE APPENDED WITH FILE NAME FOR E.G. if you
want it to save with name emp then give name as “emp.csv” in double quotes
• File is created close it and double click to open and check
Method 3 (Writing date in CSV file from Python program)
4
FILE HANDLING
• import csv module
• Use open() to open CSV file by specifying mode “w” or “a”, it will return file object.
• “w” will overwrite previous content
• “a” will add content to the end of previous content.
• Pass the file object to writer object with delimiter.
• Then use writerow() / writerows() to send data in CSV file
Binary files
▪ It stores the information in the same format as in the memory i.e. data is stored
according to its data type so no translation occurs.
▪ In binary file there is no delimiter for a new line
▪ Binary files are faster and easier for a program to read and write than text files.
▪ Data in binary files cannot be directly read, it can be read only through python
program for the same.
Binary file operations
▪ If we want to write a structure such as list or dictionary to a file, we need to use the
Python module pickle.
▪ Pickling is the process of converting structure to a byte stream before writing to a
file and while reading the content of file a reverse process called Unpickling is
used to convert the byte stream back to the original format.
Steps to perform binary file operations
▪ First we need to import the module called pickle.
▪ This module provides 2 main functions:
dump() : to write the object in file which is loaded in binary mode
Syntax : dump(object_to_write, filehandle)
load() : dumped data can be read from file using load() i.e. it is used to read
object from pickle file.
Syntax : object = load(filehandle)
Absolute Vs Relative PATH
▪ To understand PATH we must be familiar with the terms: DRIVE,
FOLDER/DIRECTORY, FILES.
▪ Our hard disk is logically divided into many parts called DRIVES like C DRIVE, D
DRIVE etc.
▪ The drive is the main container in which we put everything to store.
▪ The naming format is : DRIVE_LETTER:
▪ For e.g. C: , D:
▪ Drive is also known as ROOT DIRECTORY.
▪ Drive contains Folder and Files.
▪ Folder contains sub-folders or files
5
FILE HANDLING
Absolute Path
Absolute path is the full address of any file or folder from the Drive i.e. from ROOT
FOLDER. It is like:
Drive_Name:\Folder\Folder…\filename
Relative Path
Relative Path is the location of file/folder from the current folder. To use Relative path special
symbols are:
Single Dot ( . ) : single dot ( . ) refers to current folder.
Double Dot ( .. ) : double dot ( .. ) refers to parent folder
Backslash ( \ ) : first backslash refers to ROOT folder.
Getting name of current working directory
import os
pwd = os.getcwd()
print("Current Directory :",pwd)
File Pointer
▪ Every file maintains a file pointer which tells the current position in the file where
reading and writing operation will take.
▪ When we perform any read/write operation two things happens:
The operation at the current position of file pointer
File pointer advances by the specified number of bytes.
Example
File Modes and Opening position of file pointer
6
FILE HANDLING
FILE MODE OPENING POSITION
r, r+, rb, rb+, r+b Beginning of file
w, w+, wb, wb+, w+b Beginning of file (overwrites the file if file already exists
a, ab, a+, ab+, a+b At the end of file if file exists otherwise creates a new file
Random Access in Files using seek() and tell()
• seek() function in Python is used to change the position of the file handler/file
pointer/cursor to a given position.
• seek() function accepts two arguments (first is mandatory, second is optional)
• file_handler.Seek(offset, from_what)
offset is an integer value specifying the byte on which cursor needs to be positioned
from_what is an indicator which specifies relative position of counting of bytes.It can
have three values :
➢ 0 – from beginning (by default)
➢ 1 – from current position of the file pointer
➢ 2 – from end of file
Eg.
f1.seek(20) will move file pointer to 20th byte in the file
f1.seek(-10,1) will move 10 bytes backwards from current position of file pointer
f1.seek(10,1) will move 10 bytes forward from current position of file pointer
• Backward movement of file pointer is possible only in binary files.
• tell() function returns the current position of file pointer in terms of bytes.
• When file is opened in read or write mode, file pointer is placed at the beginning of
the file
• When file is opened in append mode, file pointer is placed at the end of the file
Eg. f1.tell()
Python is a user-friendly language.
I am learning File Handling in Python
#Consider that test.txt contains the above text
F1 = open(“test.txt”)
print(F1.tell()) # answer will be 0(zero)
f1.seek(5) # will place the file pointer on 5th byte of the file ie at ‘n’
print(F1.tell()) # answer will be 5
print(f1.read(8)) #n is a u
print(f1.tell()) # answer will be 13
Standard INPUT, OUTPUT and ERROR STREAM
▪ Standard Input devices(stdin) reads from keyboard
▪ Standard output devices(stdout) display output on monitor
▪ Standard error devices(stderr) same as stdout but normally for errors only.
Standard INPUT, OUTPUT and ERROR STREAM
▪ The standard devices are implemented as files called standard streams in Python and
we can use them by using sys module.
▪ After importing sys module we can use standard streams stdin, stdout, stderr
7
FILE HANDLING