Files&List
Files&List
• Or in the case of a USB flash drive, the data we write from our
programs can be removed from the system and transported to
another system.
• When we want to read or write a file (say on your hard drive), we first must
• Opening the file communicates with your operating system, which knows
• When you open a file, you are asking the operating system to find the file by
• In this example, we open the file mbox.txt, which should be stored in the
fhand = open('mbox.txt')
• If the open is successful, the operating system returns us a file
handle.
• The file handle is not the actual data contained in the file, but
instead it is a “handle” that we can use to read the data.
• You are given a handle if the requested file exists and you
have the proper permissions to read the file
• If the file does not exist, open will fail with a
traceback and you will not get a handle to access
the contents of the file:
fhand = open('stuff.txt')
Later we will use try and except to deal more gracefully with the
situation where we attempt to open a file that does not exist.
Reading files
• While the file handle does not contain the data for
the file, it is quite easy to construct a for loop to read
through and count each of the lines in a file:
fhand = open('mbox-short.txt’)
count = 0
for line in fhand:
count = count + 1
print('Line Count:', count)
We can use the file handle as the sequence in our for loop.
Our for loop simply counts the number of lines in the file and prints them out
• Our for loop simply counts the number of lines in the file and prints them out.
• The rough translation of the for loop into English is, “for each line in the file
represented by the file handle, add one to the count variable.”
• The reason that the open function does not read the entire file is that the file
might be quite large with many gigabytes of data.
• The open statement takes the same amount of time regardless of the size of the
file. The for loop actually causes the data to be read from the file.
• When the file is read using a for loop in this manner, Python takes care of
splitting the data in the file into separate lines using the newline character.
• Python reads each line through the newline and includes the newline as
the last character in the line variable for each iteration of the for loop
• Because the for loop reads the data one line at a time, it can efficiently
read and count the lines in very large files without running out of main
memory to store the data.
• The above program can count the lines in any size file using very little
memory since each line is read, counted, and then discarded
If you know the file is relatively small compared to the size of your
main memory, you can read the whole file into one string using
the read method on the file handle.
fhand = open('mbox-short.txt’)
inp = fhand.read()
print(len(inp))
94626
print(inp[:20])
• In this example, the entire contents(all 94,626 characters) of
the file mbox-short.txt are read directly into the variable inp.
• We use string slicing to print out the first 20 characters of the
string data stored in inp
When the file is read in this manner, all the characters including all of
the lines and newline characters are one big string in the variable inp.
fhand = open('mbox-short.txt’)
print(len(fhand.read()))
94626
print(len(fhand.read()))
0
When the file is read in this manner, all the characters including all of the lines and
newline characters are one big string in the variable inp.
It is a good idea to store the output of read as a variable because each call to read
exhausts the resource:
fhand = open('mbox-short.txt’)
print(len(fhand.read()))
94626
print(len(fhand.read()))
0
1. Remember that this form of the open function should only be used if the file
data will fit comfortably in the main memory of your computer.
2. if the file is too large to fit in main memory, you should write your program to
read the file in chunks using a for or while loop.
Searching through a file
When you are searching through data in a file, it is a very common pattern to read through a
file, ignoring most of the lines and only processing lines which meet a particular condition.
We can combine the pattern for reading a file with string methods to build simple search
mechanisms.
For example, if we wanted to read a file and only print out lines which started with the prefix
“From:”,
we could use the string method startswith to select only those lines with the desired prefix:
fhand = open('mbox-short.txt’)
count = 0
for line in fhand:
if line.startswith('From:’):
print(line)
When this program runs, we get the following output:
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected] ...
The output looks great since the only lines we are seeing are those which start
with “From:”, but why are we seeing the extra blank lines?
prints the string in the variable line which includes a newline and then print adds
another newline, resulting in the double spacing effect we see
We could use line slicing to print all but the last character,
but a simpler approach is to use the rstrip method which
strips whitespace from the right side of a string as
follows:
fhand = open('mbox-short.txt’)
for line in fhand:
line = line.rstrip()
if line.startswith('From:’):
print(line)
When this program runs, we get the following output:
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected] ... As your file processing prog
• As your file processing programs get more complicated, you may want to
structure your search loops using continue.
• The basic idea of the search loop is that you are looking for “interesting”
lines and effectively skipping “uninteresting” lines.
• And then when we find an interesting line, we do something with that line.
• We can structure the loop to follow the pattern of skipping uninteresting
lines as follows:
fhand = open('mbox-short.txt’)
for line in fhand:
line = line.rstrip()
# Skip ’uninteresting lines’
if not line.startswith('From:’):
continue
# Process our ’interesting’ line
print(line)
# Code: http://www.py4e.com/
• The output of the program is the same. In English, the
uninteresting lines are those which do not start with “From:”,
which we skip using continue.
• We can use the find string method to simulate a text editor search
that finds lines where the search string is anywhere in the line.
• Since find looks for an occurrence of a string within another string
and either returns the position of the string or -1 if the string was
not found, we can write the following loop to show lines which
contain the string “@uct.ac.za” (i.e., they come from the University
of Cape Town in South Africa):
fhand = open('mbox-short.txt’)
for line in fhand:
line = line.rstrip()
if line.find('@uct.ac.za') == -1:
continue
print(line)
# Code: http://www.py4e.com/code3/search4.p
Writing files
To write a file, you have to open it with mode “w” as a second
parameter:
If the file already exists, opening it in write mode clears out the
old data and starts fresh, so be careful! If the file doesn’t exist, a
new one is created
• The write method of the file handle object puts
data into the file, returning the number of
characters written.
• The default write mode is text for writing (and
reading) strings.
sample_list1=["Mark",5,"Jack",
List can store both
Creating a list with
9, "Chan",5] homogeneous and
known size and known
heterogeneous
elements sample_list2=["Mark","Jack", elements
"Chan"]
Displays the
Length of the list len(sample_list) number of
elements in the list
Traversing a list
• The most common way to traverse the
elements of a list is with a for loop.
The syntax is the same as for strings:
for x in college:
print(x)
for i in range(len(numbers)):
numbers[i] = numbers[i] * 2
This loop traverses the list and updates each
element.
• len() returns the number of elements in the
list.
• range returns a list of indices from 0 to n − 1,
where n is the length of the list.
• Each time through the loop, i gets the index of
the next element.
• A for loop over an empty list never executes the
body:
for x in empty:
print('This never happens.')
list_of_airlines=["AI","EM","BA"]
print("List of airlines:",list_of_airlines)
sample_list=["Mark",5,"Jack",9, "Chan",5]
print("Sample List:",sample_list)
#Random write
sample_list[2]="James"
#Random read
sample_list=sample_list+new_list
print(sample_list[11])
List operations
The + operator concatenates lists:
a = [1, 2, 3]
b = [4, 5, 6]
c=a+b
print(c)
[1, 2, 3, 4, 5, 6]
the * operator repeats a list a given number of
times:
[0] * 4
[0, 0, 0, 0]
-------------------------
[1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
List slices
The slice operator also works on lists:
t = ['a', 'b', 'c', 'd', 'e', 'f']
t[1:3]
['b', 'c']
----------------
t[:4]
['a', 'b', 'c', 'd']
--------------------
t[3:]
['d', 'e', 'f']
-----------------------------------------
• If you omit the first index, the slice starts at the beginning.
• If you omit the second, the slice goes to the end.
• So if you omit both, the slice is a copy of the whole list.
t[:]
['a', 'b', 'c', 'd', 'e', 'f']
-------------------
note:
Since lists are mutable, it is often useful to make a copy before
performing operations that fold, spindle, or mutilate lists.
#Output: b
• pop modifies the list and returns the element that was removed.
If you don’t need the removed value, you can use the del operator:
t = ['a', 'b', 'c']
del t[1]
print(t) ['a', 'c']
If you know the element you want to remove
(but not the index), you can use remove:
t = ['a', 'b', 'c‘]
t.remove('b')
print(t)
Output: ['a', 'c']
The return value from remove is None.
------------------------------------
To remove more than one element, you can
use del with a slice index:
t = ['a', 'b', 'c', 'd', 'e', 'f']
del t[1:5]
print(t)
print(max(nums))
74
print(min(nums))
3
print(sum(nums))
154
print(sum(nums)/len(nums))
25
• The sum() function only works when the list elements are
numbers.
• The other functions (max(), len(), etc.) work with lists of
strings and other types that can be comparable.
The program to compute an average
without a list
total = 0
count = 0
while (True):
inp = input('Enter a number: ')
if inp == 'done':
Break
value = float(inp)
total = total + value
count = count + 1
average = total / count
print('Average:', average)
# Code: http://www.py4e.com/cod
numlist = list()
while (True):
inp = input('Enter a number: ')
if inp == 'done':
break
value = float(inp)
numlist.append(value)
average = sum(numlist) / len(numlist)
print('Average:', average)