0% found this document useful (0 votes)
15 views23 pages

Py4Inf 07 Files

The document discusses opening and reading files in Python. It covers opening a file using the open() function, iterating through a file line by line, counting and extracting lines, and searching within files.

Uploaded by

junedijoasli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views23 pages

Py4Inf 07 Files

The document discusses opening and reading files in Python. It covers opening a file using the open() function, iterating through a file line by line, counting and extracting lines, and searching within files.

Uploaded by

junedijoasli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Reading Files

Chapter 7

Python for Informatics: Exploring Information


www.py4inf.com
Unless otherwise noted, the content of this course material is licensed under a Creative
Commons Attribution 3.0 License.
http://creativecommons.org/licenses/by/3.0/.

Copyright 2010, 2011, Charles Severance


What It is time to go
Software
Next? find some Data to
mess with!
Input Central
and Output Processing Files R Us
Devices Unit
Secondary
if x< 3: print Memory

Main From [email protected] Sat Jan 5 09:14:16 2008


Return-Path: <[email protected]>
Memory Date: Sat, 5 Jan 2008 09:12:18 -0500
To: [email protected]
From: [email protected]
Subject: [sakai] svn commit: r39772 - content/branches/
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
...
File Processing
• A text file can be thought of as a sequence of lines

From [email protected] Sat Jan 5 09:14:16 2008


Return-Path: <[email protected]>
Date: Sat, 5 Jan 2008 09:12:18 -0500
To: [email protected]
From: [email protected]
Subject: [sakai] svn commit: r39772 - content/branches/
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

http://www.py4inf.com/code/mbox-short.txt
Opening a File

• Before we can read the contents of the file we must tell Python which
file we are going to work with and what we will be doing with the file

• This is done with the open() function

• open() returns a “file handle” - a variable used to perform operations


on the file

• Kind of like “File -> Open” in a Word Processor


Using open()

• handle = open(filename, mode) fhand = open('mbox.txt', 'r')

• returns a handle use to manipulate the file

• filename is a string

• mode is optional and should be 'r' if we are planning reading the file
and 'w' if we are going to write to the file.

http://docs.python.org/lib/built-in-funcs.html
What is a Handle?
>>> fhand = open('mbox.txt')
>>> print fhand
<open file 'mbox.txt', mode 'r' at 0x1005088b0>
When Files are Missing

>>> fhand = open('stuff.txt')


Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'stuff.txt'
The newline
Character >>> stuff = 'Hello\nWorld!'
>>> stuff
'Hello\nWorld!'
• We use a special character to >>> print stuff
Hello
indicate when a line ends
called the "newline" World!
>>> stuff = 'X\nY'
• We represent it as \n in strings >>> print stuff
X
• Newline is still one character - Y
not two >>> len(stuff)
3
File Processing
• A text file can be thought of as a sequence of lines

From [email protected] Sat Jan 5 09:14:16 2008


Return-Path: <[email protected]>
Date: Sat, 5 Jan 2008 09:12:18 -0500
To: [email protected]
From: [email protected]
Subject: [sakai] svn commit: r39772 - content/branches/
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
File Processing
• A text file has newlines at the end of each line

From [email protected] Sat Jan 5 09:14:16 2008\n


Return-Path: <[email protected]>\n
Date: Sat, 5 Jan 2008 09:12:18 -0500\n
To: [email protected]\n
From: [email protected]\n
Subject: [sakai] svn commit: r39772 - content/branches/\n
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772\n
File Handle as a Sequence
• A file handle open for read can be
treated as a sequence of strings
where each line in the file is a string xfile = open('mbox.txt', 'r')
in the sequence
for cheese in xfile:
• We can use the for statement to print cheese
iterate through a sequence

• Remember - a sequence is an
ordered set
Counting Lines in a File
fhand = open('mbox.txt')
• Open a file read-only count = 0
for line in fhand:
• Use a for loop to read each count = count + 1
line
print 'Line Count:', count
• Count the lines and print out
the number of lines python open.py
Line Count: 132045
Reading the *Whole* File

>>> fhand = open('mbox-short.txt')


>>> inp = fhand.read()
• We can read the whole file
>>> print len(inp)
(newlines and all) into a
94626
single string.
>>> print inp[:20]
From stephen.marquar
Searching Through a File

fhand = open('mbox-short.txt')
for line in fhand:
• We can put an if statement in if line.startswith('From:') :
our for loop to only print print line
lines that meet some criteria
OOPS!
What are all these blank
From: [email protected]
lines doing here?
From: [email protected]

From: [email protected]

From: [email protected]
...
OOPS!
What are all these blank
From: [email protected]\n
lines doing here?
\n
From: [email protected]\n
The print statement adds a \n
newline to each line. From: [email protected]\n
\n
From: [email protected]\n
Each line from the file also
...
has a newline at the end.
Searching Through a File (fixed)
fhand = open('mbox-short.txt')
for line in fhand:
• We can strip the whitespace line = line.rstrip()
from the right hand side of if line.startswith('From:') :
the string using rstrip() from print line
the string library

• The newline is considered From: [email protected]


From: [email protected]
"white space" and is stripped
From: [email protected]
From: [email protected]
....
Skipping with continue

fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
• We can convienently
# Skip 'uninteresting lines'
skip a line by using the
if not line.startswith('From:') :
continue statement
continue
# Process our 'interesting' line
print line
Using in to select lines
fhand = open('mbox-short.txt')
• We can look for a string for line in fhand:
line = line.rstrip()
anywhere in a line as our
selection criteria if not '@uct.ac.za' in line :
continue
print line

From [email protected] Sat Jan 5 09:14:16 2008


X-Authentication-Warning: set sender to [email protected] using -f
From: [email protected]
Author: [email protected]
From [email protected] Fri Jan 4 07:02:32 2008
X-Authentication-Warning: set sender to [email protected] using -f
...
fname = raw_input('Enter the file name: ')
fhand = open(fname) Prompt for
count = 0
for line in fhand: File Name
if line.startswith('Subject:') :
count = count + 1
print 'There were', count, 'subject lines in', fname

Enter the file name: mbox.txt


There were 1797 subject lines in mbox.txt

python search6.py
Enter the file name: mbox-short.txt
There were 27 subject lines in mbox-short.txt
fname = raw_input('Enter the file name: ')
try:
fhand = open(fname)
Bad File except:
print 'File cannot be opened:', fname
Names exit()
count = 0
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
print 'There were', count, 'subject lines in', fname
Enter the file name: mbox.txt
There were 1797 subject lines in mbox.txt

Enter the file name: na na boo boo


File cannot be opened: na na boo boo
Summary
• Secondary storage • Stripping white space

• Opening a file - file handle • Using continue

• File structure - newline character • Using in as an operator

• Reading a file line-by-line with a for • Reading a file and splitting lines
loop
• Reading file names
• Reading the whole file as a string
• Dealing with bad files
• Searching for lines

You might also like