0% found this document useful (0 votes)
20 views60 pages

Files&List

Uploaded by

Pratheeth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views60 pages

Files&List

Uploaded by

Pratheeth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Unit 3

• Files: Persistence, Opening files, Reading files,


Searching through a file, Writing files.
Files
Persistence
• once the power is turned off, anything stored in either the CPU or
main memory is erased.

• we start to work with Secondary Memory (or files).

• Secondary memory is not erased when the power is turned off.

• Or in the case of a USB flash drive, the data we write from our
programs can be removed from the system and transported to
another system.

• We will primarily focus on reading and writing text files such as


those we create in a text editor.
Opening files

• When we want to read or write a file (say on your hard drive), we first must

open the file.

• Opening the file communicates with your operating system, which knows

where the data for each file is stored.

• When you open a file, you are asking the operating system to find the file by

name and make sure the file exists.

• In this example, we open the file mbox.txt, which should be stored in the

same folder that you are in when you start Python.

• You can download this file from www.py4e.com/code3/mbox.txt

fhand = open('mbox.txt')
• If the open is successful, the operating system returns us a file
handle.
• The file handle is not the actual data contained in the file, but
instead it is a “handle” that we can use to read the data.
• You are given a handle if the requested file exists and you
have the proper permissions to read the file
• If the file does not exist, open will fail with a
traceback and you will not get a handle to access
the contents of the file:
fhand = open('stuff.txt')

Traceback (most recent call last): File "", line 1, in FileNotFoundError:


[Errno 2] No such file or directory: 'stuff.txt‘

Later we will use try and except to deal more gracefully with the
situation where we attempt to open a file that does not exist.
Reading files

• While the file handle does not contain the data for
the file, it is quite easy to construct a for loop to read
through and count each of the lines in a file:

fhand = open('mbox-short.txt’)
count = 0
for line in fhand:
count = count + 1
print('Line Count:', count)

We can use the file handle as the sequence in our for loop.
Our for loop simply counts the number of lines in the file and prints them out
• Our for loop simply counts the number of lines in the file and prints them out.
• The rough translation of the for loop into English is, “for each line in the file
represented by the file handle, add one to the count variable.”

• The reason that the open function does not read the entire file is that the file
might be quite large with many gigabytes of data.

• The open statement takes the same amount of time regardless of the size of the
file. The for loop actually causes the data to be read from the file.

• When the file is read using a for loop in this manner, Python takes care of
splitting the data in the file into separate lines using the newline character.
• Python reads each line through the newline and includes the newline as
the last character in the line variable for each iteration of the for loop

• Because the for loop reads the data one line at a time, it can efficiently
read and count the lines in very large files without running out of main
memory to store the data.

• The above program can count the lines in any size file using very little
memory since each line is read, counted, and then discarded
If you know the file is relatively small compared to the size of your
main memory, you can read the whole file into one string using
the read method on the file handle.

fhand = open('mbox-short.txt’)
inp = fhand.read()
print(len(inp))
94626
print(inp[:20])
• In this example, the entire contents(all 94,626 characters) of
the file mbox-short.txt are read directly into the variable inp.
• We use string slicing to print out the first 20 characters of the
string data stored in inp
When the file is read in this manner, all the characters including all of
the lines and newline characters are one big string in the variable inp.

It is a good idea to store the output of read as a variable because


each call to read exhausts the resource:

fhand = open('mbox-short.txt’)
print(len(fhand.read()))
94626
print(len(fhand.read()))
0
When the file is read in this manner, all the characters including all of the lines and
newline characters are one big string in the variable inp.
It is a good idea to store the output of read as a variable because each call to read
exhausts the resource:

fhand = open('mbox-short.txt’)
print(len(fhand.read()))
94626
print(len(fhand.read()))
0

1. Remember that this form of the open function should only be used if the file
data will fit comfortably in the main memory of your computer.

2. if the file is too large to fit in main memory, you should write your program to
read the file in chunks using a for or while loop.
Searching through a file
When you are searching through data in a file, it is a very common pattern to read through a
file, ignoring most of the lines and only processing lines which meet a particular condition.

We can combine the pattern for reading a file with string methods to build simple search
mechanisms.

For example, if we wanted to read a file and only print out lines which started with the prefix
“From:”,

we could use the string method startswith to select only those lines with the desired prefix:

fhand = open('mbox-short.txt’)
count = 0
for line in fhand:
if line.startswith('From:’):
print(line)
When this program runs, we get the following output:

From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected] ...

The output looks great since the only lines we are seeing are those which start
with “From:”, but why are we seeing the extra blank lines?

This is due to that invisible newline character.


Each of the lines ends with a newline, so the print statement

prints the string in the variable line which includes a newline and then print adds
another newline, resulting in the double spacing effect we see
We could use line slicing to print all but the last character,
but a simpler approach is to use the rstrip method which
strips whitespace from the right side of a string as
follows:

fhand = open('mbox-short.txt’)
for line in fhand:
line = line.rstrip()
if line.startswith('From:’):
print(line)
When this program runs, we get the following output:

From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected] ... As your file processing prog
• As your file processing programs get more complicated, you may want to
structure your search loops using continue.

• The basic idea of the search loop is that you are looking for “interesting”
lines and effectively skipping “uninteresting” lines.
• And then when we find an interesting line, we do something with that line.
• We can structure the loop to follow the pattern of skipping uninteresting
lines as follows:
fhand = open('mbox-short.txt’)
for line in fhand:
line = line.rstrip()
# Skip ’uninteresting lines’
if not line.startswith('From:’):
continue
# Process our ’interesting’ line
print(line)
# Code: http://www.py4e.com/
• The output of the program is the same. In English, the
uninteresting lines are those which do not start with “From:”,
which we skip using continue.

• or the “interesting” lines (i.e., those that start with “From:”) we


perform the processing on those lines.

• We can use the find string method to simulate a text editor search
that finds lines where the search string is anywhere in the line.
• Since find looks for an occurrence of a string within another string
and either returns the position of the string or -1 if the string was
not found, we can write the following loop to show lines which
contain the string “@uct.ac.za” (i.e., they come from the University
of Cape Town in South Africa):
fhand = open('mbox-short.txt’)
for line in fhand:
line = line.rstrip()
if line.find('@uct.ac.za') == -1:
continue
print(line)

# Code: http://www.py4e.com/code3/search4.p
Writing files
To write a file, you have to open it with mode “w” as a second
parameter:

fout = open('output.txt', 'w’)


print(fout)

If the file already exists, opening it in write mode clears out the
old data and starts fresh, so be careful! If the file doesn’t exist, a
new one is created
• The write method of the file handle object puts
data into the file, returning the number of
characters written.
• The default write mode is text for writing (and
reading) strings.

line1 = "This here's the wattle,\n"


fout.write(line1)
24

Again, the file object keeps track of where it is, so if


you call write again, it adds the new data to the end
• We must make sure to manage the ends of lines as we
write to the file by explicitly inserting the newline
character when we want to end a line.

• The print statement automatically appends a newline,


but the write method does not add the newline
automatically.

line2 = 'the emblem of our land.\n’


fout.write(line2)
output: 24
• When you are done writing, you have to close the file to
make sure that the last bit of data is physically written
to the disk so it will not be lost if the power goes off.
fout.close()

• We could close the files which we open for read as well,


but we can be a little sloppy if we are only opening a
few files since Python makes sure that all open files are
closed when the program ends.

• When we are writing files, we want to explicitly close


the files so as to leave nothing to chance
List
• Lists : Basics, Traversing a list, List operations,
List slicing, List methods, Deleting elements,
Lists and functions.
List
List is a sequence of values.
• In a string, the values are characters;
• In list, they can be any type
• The values in a list are called elements or
sometimes items
• Example [1,2,4,5,] or [‘nie’, ‘sjce’, ‘vviet’]
• Elements of the list need not be same type.
[“spam”, 2.05 ,5 ,[10,20]] nested list

• A list that contains no elements is called an


empty list; you can create one with empty
brackets, [].
• cheeses = ['Cheddar', 'Edam', 'Gouda']
numbers = [17, 123] >>> empty = []

print(cheeses, numbers, empty)

output: ['Cheddar', 'Edam', 'Gouda'] [17, 123] []


• Each element in the list has a position in the
list known as an index.
The list index starts from zero. It’s like having
seat numbers starting from 0!

Element 78808 26302 93634 13503 48306


Index 0 1 2 3 4
Lists are mutable
• The syntax for accessing the elements of a list is the same as
for accessing the characters of a string: the bracket operator.
The expression inside the brackets specifies the index.
Remember that the indices start at 0:
• a=[1,2,3,4,5]
• print(a[0]) output : 1

lists are mutable


because you can change the order of items in a list or
reassign an item in a list. When the bracket operator appears
on the left side of an assignment, it identifies the element of
the list that will be assigned.
numbers = [17, 123]
>>> numbers[1] = 5
>>> print(numbers) [17, 5]

You can think of a list as a relationship between indices and


elements. This relationship is called a mapping.

List indices work the same way as string indices:


• Any integer expression can be used as an index.
• If you try to read or write an element that does not exist, you
get an IndexError.
• If an index has a negative value, it counts backward from the
end of the list
• The in operator also works on lists.
college = [‘nie', ‘sjce', ‘pes']
‘sjce' in college
True
Creating an empty list sample_list=[]

sample_list1=["Mark",5,"Jack",
List can store both
Creating a list with
9, "Chan",5] homogeneous and
known size and known
heterogeneous
elements sample_list2=["Mark","Jack", elements
"Chan"]

Creating a list with None denotes an


known size and unknown sample_list=[None]*5 unknown value in
elements Python

Displays the
Length of the list len(sample_list) number of
elements in the list
Traversing a list
• The most common way to traverse the
elements of a list is with a for loop.
The syntax is the same as for strings:
for x in college:
print(x)
for i in range(len(numbers)):
numbers[i] = numbers[i] * 2
This loop traverses the list and updates each
element.
• len() returns the number of elements in the
list.
• range returns a list of indices from 0 to n − 1,
where n is the length of the list.
• Each time through the loop, i gets the index of
the next element.
• A for loop over an empty list never executes the
body:
for x in empty:
print('This never happens.')

Although a list can contain another list, the


nested list still counts as a single element.
The length of this list is four:
['spam', 1, ['Brie', 'Roquefort', 'Polle Veq'], [1, 2, 3] ]
list_of_airlines=["AI","EM","BA"]

print("Iterating the list using range()")


for index in range(0,len(list_of_airlines)):
print(list_of_airlines[index])

print("Iterating the list using keyword in")


for airline in list_of_airlines:
print(airline)
#list can store homogeneous data

list_of_airlines=["AI","EM","BA"]
print("List of airlines:",list_of_airlines)

#list can store heterogeneous data

sample_list=["Mark",5,"Jack",9, "Chan",5]
print("Sample List:",sample_list)

#Length of the list

print("Number of elements in the list:",len(sample_list))


#Random read

print("Element at 2nd index position:", sample_list[2])

#Random write

sample_list[2]="James"

#Random read

• print("Element at 2nd index position after random


write:",sample_list[2])
#Adding an element to list
sample_list.append("James")
print("After adding element to list:",sample_list)

#Combining two lists


new_list=["Henry","Tim"]
sample_list+=new_list

#Adds Henry and Tim to the existing sample_list


print("After combining two lists - 1st
way:",sample_list)
#Another way to combine two lists

sample_list=sample_list+new_list

#Adds Henry and Tim to the new sample_list

print("After combining two lists - 2nd way:",sample_list)

#Accessing an element beyond the total number of elements in the


list

print(sample_list[11])
List operations
The + operator concatenates lists:
a = [1, 2, 3]
b = [4, 5, 6]
c=a+b

print(c)
[1, 2, 3, 4, 5, 6]
the * operator repeats a list a given number of
times:
[0] * 4
[0, 0, 0, 0]
-------------------------

[1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
List slices
The slice operator also works on lists:
t = ['a', 'b', 'c', 'd', 'e', 'f']
t[1:3]
['b', 'c']
----------------
t[:4]
['a', 'b', 'c', 'd']
--------------------
t[3:]
['d', 'e', 'f']
-----------------------------------------
• If you omit the first index, the slice starts at the beginning.
• If you omit the second, the slice goes to the end.
• So if you omit both, the slice is a copy of the whole list.
t[:]
['a', 'b', 'c', 'd', 'e', 'f']
-------------------
note:
Since lists are mutable, it is often useful to make a copy before
performing operations that fold, spindle, or mutilate lists.

A slice operator on the left side of an assignment can update


multiple elements:

t = ['a', 'b', 'c', 'd', 'e', 'f']


t[1:3] = ['x', 'y'] // overwrites b, c with x, y
print(t)

Output: ['a', 'x', 'y', 'd', 'e', 'f']


List methods
Python provides methods that operate on
lists. For example, append adds a new
element to the end of a list:
t = ['a', 'b', 'c']
t.append('d')
print(t)

Output: ['a', 'b', 'c', 'd']


extend takes a list as an argument and appends
all of the elements:
t1 = ['a', 'b', 'c']
t2 = ['d', 'e']
t1.extend(t2)
print(t1)

Output: ['a', 'b', 'c', 'd', 'e']


sort arranges the elements of the list from low
to high:
t = ['d', 'c', 'e', 'b', 'a']
t.sort()
print(t)

output : ['a', 'b', 'c', 'd', 'e']


• Most list methods are void; they modify the
list and return None.
• If you accidentally write t = t.sort(), you will be
disappointed with the result.
Deleting elements

There are several ways to delete elements from a list. If


you know the index of the element you want, you can
use pop:
t = ['a', 'b', 'c']
x = t.pop(1)
print(t)

# Output: ['a', 'c']


print(x)

#Output: b
• pop modifies the list and returns the element that was removed.

• If you don’t provide an index, it deletes and returns the last


element.

If you don’t need the removed value, you can use the del operator:
t = ['a', 'b', 'c']
del t[1]
print(t) ['a', 'c']
If you know the element you want to remove
(but not the index), you can use remove:
t = ['a', 'b', 'c‘]
t.remove('b')
print(t)
Output: ['a', 'c']
The return value from remove is None.
------------------------------------
To remove more than one element, you can
use del with a slice index:
t = ['a', 'b', 'c', 'd', 'e', 'f']
del t[1:5]
print(t)

output: ['a', 'f']

As usual, the slice selects all the elements up to,


but not including, the second index
Lists and functions
There are a number of built-in functions that can be used on lists that allow you to quickly look
through a list without writing your own loops:
nums = [3, 41, 12, 9, 74, 15]
print(len(nums)) #output : 6

print(max(nums))
74
print(min(nums))
3
print(sum(nums))
154
print(sum(nums)/len(nums))
25
• The sum() function only works when the list elements are
numbers.
• The other functions (max(), len(), etc.) work with lists of
strings and other types that can be comparable.
The program to compute an average
without a list
total = 0
count = 0
while (True):
inp = input('Enter a number: ')
if inp == 'done':
Break
value = float(inp)
total = total + value
count = count + 1
average = total / count
print('Average:', average)

# Code: http://www.py4e.com/cod
numlist = list()
while (True):
inp = input('Enter a number: ')
if inp == 'done':
break
value = float(inp)
numlist.append(value)
average = sum(numlist) / len(numlist)
print('Average:', average)

You might also like