Ch11 ManipulatingTextWithMethodsAndFiles
Ch11 ManipulatingTextWithMethodsAndFiles
def phones():
phones = phonebook()
phonelist = phones.split('\n')
newphonelist = []
for list in phonelist:
newphonelist = newphonelist + [list.split(":")]
return newphonelist
def findPhone(person):
for people in phones():
if people[0] == person:
print "Phone number for",person,"is",people[1]
Running the Phonebook
>>> print phonebook()
Mary:893-0234:Realtor:
Fred:897-2033:Boulder crusher:
Barney:234-2342:Professional bowler:
>>> print phones()
[[''], ['Mary', '893-0234', 'Realtor', ''], ['Fred', '897-2033', 'Boulder
crusher', ''], ['Barney', '234-2342', 'Professional bowler', '']]
>>> findPhone('Fred')
Phone number for Fred is 897-2033
Strings have no font
’ Strings are only the characters of text displayed
“WYSIWYG” (What You See is What You Get)
’ WYSIWYG text includes fonts and styles
’ The font is the characteristic look of the letters in all
sizes
’ The style is typically the boldface, italics, underline,
and other effects applied to the font
’ In printer's terms, each style is its own font
Encoding font information
’ Font and style information is often encoded as style
runs
’ A separate representation from the string
’ Indicates bold, italics, or whatever style modification;
start character; and end character.
The old brown fox runs.
’ Could be encoded as:
"The old brown fox runs."
[[bold 0 6] [italics 5 12]]
How do we encode all that?
’ Is it a single value? Not really.
’ Do we encode it all in a complex list? We could.
’ How do most text systems handle this?
’ As objects
’ Objects have data, maybe in many parts.
’ Objects know how to act upon their data.
’ Objects' methods may be known only to that object, or
may be known by many objects, but each object
performs that method differently.
What can we do with all this?
’ Answer: Just about anything!
’ Strings and lists are about as powerful as one gets in
Python
’ By “powerful,” we mean that we can do a lot of different kinds of
computation with them.
’ Examples:
’ Pull up a Web page and grab information out of it, from within a function.
’ Find a nucleotide sequence in a string and print its name.
’ Manipulate functions' source
640x480.jpg
Why do I care about all this?
’ If you're going to process files, you need to know where
they are (directories) and how to specify them (paths).
’ If you're going to do movie processing, which involves
lots of files, you need to be able to write programs that
process all the files in a directory (or even several
directories) without having to write down each and every
name of the files.
Using lists to represent trees
>>> tree =
[["Leaf1","Leaf2"],[["Leaf3"],["Leaf4"],"Leaf5"]]
>>> print tree
[['Leaf1', 'Leaf2'], [['Leaf3'], ['Leaf4'], 'Leaf5']]
>>> print tree[0]
['Leaf1', 'Leaf2']
>>> print tree[1]
[['Leaf3'], ['Leaf4'], 'Leaf5'] Leaf5
>>> print tree[1][0]
['Leaf3'] Leaf1
>>> print tree[1][1] Leaf3
Leaf4
['Leaf4'] Leaf2
>>> print tree[1][2]
Leaf5
The Point: Lists allow us to
represent complex
relationships, like trees
How to open a file
’ For reading or writing a file (getting characters out or
putting characters in), you need to use open
’ open(filename , how) opens the filename.
’ If you don't provide a full path, the filename is assumed to be in the
same directory as JES.
’ how is a two character string that says what you want
to do with the string.
’ “rt” means “read text”
’ “wt” means “write text”
’ “rb” and “wb” means read or write bytes
’ We won't do much of that
Methods on files: Open returns a file object
gcttagatgtcagattgagcacgatgatcgattgaccgtgagatcgacga
gatgcgcagatcgagatctgcatacagatgatgaccatagtgtacg
ttctcgctcacactagaagcaagacaatttacactattattattattatt
accattattattattattattactattattattattattactattattta
ctacgtcgctttttcactccctttattctcaaattgtgtatccttccttt
How are we going to do it?
’ First, we get the sequences in a big string.
’ Next, we find where the small subsequence is in the big
string.
’ From there, we need to work backwards until we find
“>” which is the beginning of the line with the sequence
name.
’ From there, we need to work forwards to the end of the
line. From “>” to the end of the line is the name of the
sequence
’ Yes, this is hard to get just right. Lots of debugging prints.
The code that does it
def findSequence(seq):
sequencesFile = getMediaPath("parasites.txt")
file = open(sequencesFile,"rt")
sequences = file.read()
file.close()
# Find the sequence
seqloc = sequences.find(seq)
#print "Found at:", seqloc
if seqloc <> -1:
# Now, find the ">" with the name of the sequence
nameloc = sequences.rfind(">",0,seqloc)
#print "Name at:",nameloc
endline = sequences.find("\n",nameloc)
print "Found in ",sequences[nameloc:endline]
if seqloc == -1:
print "Not found"
Why -1?
’ If .find or .rfind don't find something, they return -1
’ If they return 0 or more, then it's the index of where the
search string is found.
’ What's “<>”?
’ That's notation for “not equals”
’ You can also use “!=“
Running the program
>>> findSequence("tagatgtcagattgagcacgatgatcgattgacc")
Found in >Schisto unique AA825099
>>> findSequence("agtcactgtctggttgaaagtgaatgcttccaccgatt")
Found in >Schisto unique mancons0736
Example: Get the temperature
’ The weather is always
available on the Internet.