0% found this document useful (0 votes)
26 views11 pages

Slides8 Strings Nup

Uploaded by

marioagloria59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views11 pages

Slides8 Strings Nup

Uploaded by

marioagloria59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

The str Class

One of the most useful Python data types is the string type,
CS303E: Elements of Computers defined by the str class. Strings are actually sequences of
characters.
and Programming
More on Strings Strings are immutable, meaning you can’t change them after they
are created.

Dr. Bill Young


Department of Computer Science
University of Texas at Austin
© William D. Young, All rights reserved.

Last updated: August 27, 2024 at 14:25

CS303E Slideset 8: 1 More on Strings CS303E Slideset 8: 2 More on Strings

Object Creation/Instantiation Creating Strings

All immutable objects with the same content are stored as one
object.
Strings have some associated special syntax:
>>> s1 = str ( " Hello " ) # using the constructor function
>>> s2 = " Hello " # alternative syntax
>>> id ( s1 ) # strings are unique
1 39 86 4 25 5 46 44 2 4
>>> id ( s2 )
1 39 86 4 25 5 46 44 2 4
>>> s3 = str ( " Hello " )
>>> id ( s3 )
1 39 86 4 25 5 46 44 2 4
>>> s1 is s2 # are these the same object ?
True
>>> s2 is s3
True

CS303E Slideset 8: 3 More on Strings CS303E Slideset 8: 4 More on Strings


Sequence Operations Functions on Strings
Strings are sequences of characters. Below are some functions
defined on sequence types, though not all supported on strings Some functions that are available on strings:
(e.g., sum). Function Description
Function Description len(s) return length of the string
x in s x is in sequence s min(s) return char in string with lowest ASCII value
x not in s x is not in sequence s max(s) return char in string with highest ASCII value
s1 + s2 concatenates two sequences
s * n repeat sequence s n times >>> s1 = " Hello , World ! "
>>> len ( s1 )
s[i] ith element of sequence (0-based) 13
s[i:j] slice of sequence s from i to j-1 >>> min ( s1 )
len(s) number of elements in s ’ ’
>>> min ( " Hello " )
min(s) minimum element of s
’H ’
max(s) maximum element of s >>> max ( s1 )
sum(s) sum of elements in s ’r ’
for loop traverse elements of sequence
<, <=, >, >= compares two sequences Why does it make sense for a blank to have lower ASCII value than
==, != compares two sequences any letter?

CS303E Slideset 8: 5 More on Strings CS303E Slideset 8: 6 More on Strings

Indexing into Strings Indexing into Strings

Strings are sequences of characters, which can be accessed via an


index.

>>> s = " Hello , World ! "


>>> s [0]
’H ’
>>> s [6]
’ ’
>>> s [ -1]
Indexes are 0-based, ranging from [0 ... len(s)-1]. ’! ’
>>> s [ -6]
’W ’
You can also index using negatives, s[-i] means s[len(s)-i]. >>> s [ -6 + len ( s ) ]
’W ’

CS303E Slideset 8: 7 More on Strings CS303E Slideset 8: 8 More on Strings


Slicing Concatenation and Repetition

Slicing means to select a contiguous General Forms:


subsequence of a sequence or string. s1 + s2
s * n
General Form: n * s
String[start : end]
s1 + s1 means to create a new string of s1 followed by s2.
>>> s = " Hello , World ! " s * n or n * s means to create a new string containing n
>>> s [1 : 4] # substring from s [1]... s [3]
’ ell ’
repetitions of s
>>> s [ : 4] # substring from s [0]... s [3] >>> s1 = " Hello "
’ Hell ’ >>> s2 = " , World ! "
>>> s [1 : -3] # substring from s [1]... s [ -4] >>> s1 + s2 # + is not commutative
’ ello , Wor ’ ’ Hello , World ! ’
>>> s [1 : ] # same as s [1 : s ( len ) ] >>> s1 * 3 # * is commutative
’ ello , World ! ’ ’ H el lo He l lo He ll o ’
>>> s [ : 5] # same as s [0 : 5] >>> 3 * s1
’ Hello ’ ’ H el lo He l lo He ll o ’
>>> s [:] # same as s
’ Hello , World ! ’
>>> s [3 : 1] # empty slice Notice that concatenation and repetition overload two familiar
’’ operators.
CS303E Slideset 8: 9 More on Strings CS303E Slideset 8: 10 More on Strings

Looking Back in and not in operators

In Slideset 5, we had code to compute and print a multiplication The in and not in operators allow checking whether one string is
table up to LIMIT - 1, a contiguous substring of another.
> python Mu l t i p l i c a t i o n T a b l e . py
Multiplicatio n Table General Forms:
| 1 2 3 4 5 6 7 8 9 s1 in s2
------------------------------------------
1 | 1 2 3 4 5 6 7 8 9 s1 not in s2
2 | 2 4 6 8 10 12 14 16 18
....
>>> s1 = " xyz "
9 | 9 18 27 36 45 54 63 72 81
>>> s2 = " abcxyzrls "
>>> s3 = " axbyczd "
which included: >>> s1 in s2
True
print ( " - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - " ) >>> s1 in s3
False
That works well for LIMIT = 10, but not otherwise. How could >>> s1 not in s2
False
you fix it? >>> s1 not in s3
print ( " ------ " + " ----" * ( LIMIT - 1) ) True

CS303E Slideset 8: 11 More on Strings CS303E Slideset 8: 12 More on Strings


Aside: Equality of Objects Equality of Objects

>>> s1 = " xyzabc "


There are two senses in which objects can be equal. >>> s2 = " xyz " + " abc "
>>> s3 = str ( " xy " + " za " + " bc " )
1 They can have equal contents; test with ==. >>> s1 is s2 # s1 , s2 , s3 are all
True # the same object in
2 They can be literally the same object (same data in memory); >>> s2 == s3 # memory
test with is. True
>>> s1 == s2
For elementary immutable object classes such as strings and True
numbers, these are the same. That’s not necessary true for >>> from Circle import *
>>> c1 = Circle () # circle with radius 1
complex objects like lists or tuples. >>> c2 = Circle () # circle with radius 1
>>> c1 == c2 # they ’ re different
For user-defined classes, (o1 == o2) is False unless (o1 is o2) False
>>> c3 = c2 # c3 is new pointer to c2
or you’ve overloaded == by defining __eq__ for the class. >>> c2 == c3 # they ’ re the same object
True

CS303E Slideset 8: 13 More on Strings CS303E Slideset 8: 14 More on Strings

Equality of Objects Comparing Strings

If two objects satisfy (x is y), then they satisfy (x == y), but


In addition to equality comparisons, you can order strings using the
not always vice versa.
relational operators: <, <=, >, >=.
>>> from Circle import * For strings, this is lexicographic (or alphabetical) ordering using
>>> c1 = Circle ()
>>> c2 = Circle () the ASCII character codes.
>>> c3 = c2
>>> c1 is c2 >>> " abc " < " abcd "
False True
>>> c3 is c2 >>> " abcd " <= " abc "
True False
>>> c1 == c2 >>> " Paul Jones " < " Paul Smith "
False True
>>> c2 == c3 >>> " Paul Smith " < " Paul Smithson "
True True
>>> " Paula Smith " < " Paul Smith "
False
If you define a class, you can override == and make any equality
comparison you like.

CS303E Slideset 8: 15 More on Strings CS303E Slideset 8: 16 More on Strings


Iterating Over a String Iterating Over a String

Sometimes it is useful to do something to each character in a


string, e.g., change the case (lower to upper and upper to lower). General Form:
DIFF = ord ( ’a ’) - ord ( ’A ’) for c in s:
body
def swapCase ( s ) :
result = " " You can also iterate using the indexes:
for ch in s :
if ( ’A ’ <= ch <= ’Z ’ ) : def swapCase2 ( s ) :
result += chr ( ord ( ch ) + DIFF ) result = " "
elif ( ’a ’ <= ch <= ’z ’ ) : for i in range ( len ( s ) ) :
result += chr ( ord ( ch ) - DIFF ) ch = s [ i ]
else : if ( ’A ’ <= ch <= ’Z ’ ) :
result += ch result += chr ( ord ( ch ) + DIFF )
return result elif ( ’a ’ <= ch <= ’z ’ ) :
result += chr ( ord ( ch ) - DIFF )
print ( swapCase ( " abCDefGH " ) ) else :
result += ch
return result
> python StringIterate . py
ABcdEFgh

CS303E Slideset 8: 17 More on Strings CS303E Slideset 8: 18 More on Strings

What You Can’t Do Strings are Immutable

def swapCaseWrong ( s ) :
for i in range ( len ( s ) ) : You can’t change a string, by assigning at an index. You have to
if ( ’A ’ <= s [ i ] <= ’Z ’ ) : create a new string.
s [ i ] = chr ( ord ( s [ i ]) + DIFF )
elif ( ’a ’ <= s [ i ] <= ’z ’ ) :
s [ i ] = chr ( ord ( s [ i ]) - DIFF )
return s >>> s = " Pat "
>>> s [0] = ’R ’
print ( swapCaseWrong ( " abCDefGH " ) ) Traceback ( most recent call last ) :
File " < stdin > " , line 1 , in < module >
TypeError : ’ str ’ object does not support item assignment
> python StringIterate . py >>> s2 = ’R ’ + s [1:]
Traceback ( most recent call last ) : >>> s2
File " StringIterate . py " , line 38 , in < module > ’ Rat ’
print ( swapCaseWrong ( " abCDefGH " ) )
File " StringIterate . py " , line 35 , in swapCaseWrong
s [ i ] = chr ( ord ( s [ i ]) - DIFF )
TypeError : ’ str ’ object does not support item assignment
Whenever you concatenate two strings or append something to a
string, you create a new value. Don’t forget to save it!
What went wrong?

CS303E Slideset 8: 19 More on Strings CS303E Slideset 8: 20 More on Strings


Let’s Take a Break Useful Testing Methods

Below are some useful methods.

Function Description
s.isalnum(): nonempty alphanumeric string?
s.isalpha(): nonempty alphabetic string?
s.isdigit(): nonempty and contains only digits?
s.isidentifier(): follows rules for Python identifier?
s.islower(): nonempty and contains only lowercase letters?
s.isupper(): nonempty and contains only uppercase letters?
s.isspace(): nonempty and contains only whitespace?

Notice that these are methods of class str, not functions, so must
be called on a string s.
>>> islower ( " xyz " )
Traceback ( most recent call last ) :
File " < stdin > " , line 1 , in < module >
NameError : name ’ islower ’ is not defined

CS303E Slideset 8: 21 More on Strings CS303E Slideset 8: 22 More on Strings

Useful Testing Methods Example: Recognizer for Integers

>>> s1 = " abc123 "


>>> s1 . isalnum ()
True
>>> s1 . isalpha ()
False Suppose you want to know if your string input represents a decimal
>>> " abcd " . isalpha () integer, which may be signed. You might write the following:
True
>>> " 1234 " . isdigit () def isInt ( s ) :
True return s . isdigit () \
>>> " abcd " . islower () or ( ( s [0] == ’ - ’ or s [0] == ’+ ’) \
True and s [1:]. isdigit () )
>>> " abCD " . isupper ()
False
>>> " " . islower () Notice that this allows some peculiar inputs like +000000, but then
False
>>> " " . isdigit () so does Python.
False
>>> " \ t \ n \ r " . isspace () # contains tab , newline , return
True
>>> " \ t \ n xyz " . isspace () # contains non - whitespace
False

CS303E Slideset 8: 23 More on Strings CS303E Slideset 8: 24 More on Strings


Better Error Checking Better Error Checking

When your program accepts input from the user, it’s always a good When your program accepts input from the user, it’s always a good
idea to “validate” the input. idea to “validate” the input.
Earlier in the semester, we wrote: Earlier in the semester, we wrote:
# See if an integer entered is prime . # See if an integer entered is prime .
num = int ( input ( " Enter an integer : " ) ) num = int ( input ( " Enter an integer : " ) )
< code to test if num is prime > < code to test if num is prime >

What’s ’wrong’ with this code? What’s ’wrong’ with this code?
If the string entered does not represent an integer, int might fail.
>>> num = int ( input ( " Enter an integer : " ) )
Enter an integer : 3.4
Traceback ( most recent call last ) :
File " < stdin > " , line 1 , in < module >
ValueError : invalid literal for int () with base 10: ’ 3.4 ’

CS303E Slideset 8: 25 More on Strings CS303E Slideset 8: 26 More on Strings

Better Error Checking Better Error Checking

This is better: This is better:


# See if an integer entered is prime . # See if an integer entered is prime .
while ( True ) : while ( True ) :
# recall that input returns a string # recall that input returns a string
stringInput = input ( " Enter a positive integer : " ) stringInput = input ( " Enter a positive integer : " )
if ( stringInput . isdigit () ) : if ( stringInput . isdigit () ) :
break break
else : else :
print ( " Invalid input : not a positive integer . " , \ print ( " Invalid input : not a positive integer . " , \
" Try again ! " ) " Try again ! " )
# At this point , do we know that stringInput represents # At this point , do we know that stringInput represents
# a positive integer ? Any positive integer ? # a positive integer ? Any positive integer ?
num = int ( stringInput ) num = int ( stringInput )
< code to test if num is prime > < code to test if num is prime >

This still isn’t quite right. Can you see what’s wrong? This still isn’t quite right. Can you see what’s wrong?
It doesn’t allow +3, but does allow 0. How would you fix it?

CS303E Slideset 8: 27 More on Strings CS303E Slideset 8: 28 More on Strings


Testing Our Code Substring Search

We already saw that in and not in work on strings.


Python provides some other string methods to see if a string
> python IsPrime4 . py
Enter a positive integer : -12
contains another as a substring:
Invalid input : not a positive integer . Try again !
Enter a positive integer : abcd
Invalid input : not a positive integer . Try again ! Function Description
Enter a positive integer : 57 s.endswith(s1): does s end with substring s1?
57 is not prime s.startswith(s1): does s start with substring s1?
s.find(s1): lowest index where s1 starts in s, -1 if not found
s.rfind(s1): highest index where s1 starts in s, -1 if not found
s.count(s1): number of non-overlapping occurrences of s1 in s

CS303E Slideset 8: 29 More on Strings CS303E Slideset 8: 30 More on Strings

Substring Search Converting Strings

Below are some additional methods on strings. Remember that


>>> s = " Hello , World ! "
>>> s . endswith ( " d ! " ) strings are immutable, so these all make a new copy of the string.
True They don’t change s.
>>> s . startswith ( " hello " ) # case matters
False
>>> s . startswith ( " Hello " ) Function Description
True
>>> s . find ( ’l ’) # search from left s.capitalize(): return a copy with first character capitalized
2 s.lower(): lowercase all letters
>>> s . rfind ( ’l ’) # search from right s.upper(): uppercase all letters
10 s.title(): capitalize all words
>>> s . count ( ’l ’)
3 s.swapcase(): lowercase letters to upper, and vice versa
>>> " ababababa " . count ( ’ aba ’) # nonov erlappin g occurrences s.replace(old, new): replace occurences of old with new
2

So remember to save the result!

CS303E Slideset 8: 31 More on Strings CS303E Slideset 8: 32 More on Strings


Don’t Forget to Save the Result String Conversions

>>> " abcDEfg " . upper ()


A very common error is to forget what it means to be immutable: ’ ABCDEFG ’
>>> " abcDEfg " . lower ()
no operation changes the original string. If you want the changed ’ abcdefg ’
result, you have to save it. >>> " abc123 " . upper () # only letters
’ ABC123 ’
>>> " abcDEF " . capitalize ()
>>> s1 = " abCDefGH " ’ Abcdef ’
>>> s1 . swapcase () >>> " abcDEF " . swapcase () # only letters
’ ABcdEFgh ’ ’ ABCdef ’
>>> s1 # s1 didn ’t change >>> book = " introduction to programming using python "
’ abCDefGH ’ >>> book . title () # doesn ’t change book
>>> s2 = s1 . swapcase () # save the result ’ Introduction To Programming Using Python ’
>>> s2 >>> book2 = book . replace ( " ming " , " s " )
’ ABcdEFgh ’ >>> book2
>>> ’ introduction to programs using python ’
>>> book2 . title ()
’ Introduction To Programs Using Python ’
BTW: what happens to the result if you don’t save it? >>> book2 . title () . replace ( " Using " , " With " )
’ Introduction To Programs With Python ’

CS303E Slideset 8: 33 More on Strings CS303E Slideset 8: 34 More on Strings

Stripping Whitespace Strip User Input

It’s often useful to remove whitespace at the start, end, or both of


It’s typically a good idea to strip user input to remove extraneous
string input. Use these functions:
white space!

Function Description
s.lstrip(): return copy with leading whitespace removed >>> ans = input ( " Please enter YES or NO : " )
s.rstrip(): return copy with trailing whitespace removed Please enter YES or NO : NO
>>> ans
s.strip(): return copy with leading and trailing whitespace removed ’ NO ’
>>> ans == ’ YES ’ or ans == ’ NO ’
>>> s1 = " abc " False
>>> s1 . lstrip () # new string >>> ans = input ( " Please enter YES or NO : " ) . strip ()
’ abc ’ Please enter YES or NO : YES
>>> s1 . rstrip () # new string >>> ans
’ abc ’ ’ YES ’
>>> s1 . strip () # new string >>> ans == ’ YES ’ or ans == ’ NO ’
’ abc ’ True
>>> " a b c " . strip () >>>
’a b c ’

CS303E Slideset 8: 35 More on Strings CS303E Slideset 8: 36 More on Strings


Formatting Strings Looking Back (Again)

Recall from Slideset 3, our functions for formatting strings. The In Slideset 5, we had code to compute and print a multiplication
str class also has some formatting options: table up to LIMIT - 1.
Function Description > python M u l t i p l i c at i o n T a b l e . py
s.center(w): returns a string of length w, with s centered Multi plicatio n Table
| 1 2 3 4 5 6 7 8 9
s.ljust(w): returns a string of length w, with s left justified ------------------------------------------
s.rjust(w): returns a string of length w, with s right justified 1 | 1 2 3 4 5 6 7 8 9
...
s = " abc "
>>> s . center (10) # new string which included the following code to center the title:
’ abc ’
>>> s . ljust (10) # new string print ( " Mul tiplicat ion Table " )
’ abc ’
>>> s . rjust (10) # new string
’ abc ’ A better way would be:
>>> s . center (2) # new string
print ( " Mult iplication Table " . center (6 + 4 * ( LIMIT -1) ) )
’ abc ’

CS303E Slideset 8: 37 More on Strings CS303E Slideset 8: 38 More on Strings

Multiplication Table Revisited String Example: CSV Files

With LIMIT = 10: A comma-separated values (csv) file is a common way to record
> python Mu l t i p l i c a t i o n T a b l e . py data. Each line has multiple values separated by commas. For
Multiplicatio n Table example, I can download your grades from Canvas in csv format:
| 1 2 3 4 5 6 7 8 9
------------------------------------------
1 | 1 2 3 4 5 6 7 8 9
Name , EID , HW1 , HW2 , Exam1 , Exam2 , Exam3
2 | 2 4 6 8 10 12 14 16 18 Possible , ,10 ,10 ,100 ,100 ,100
... Jones ; Bob , bj123 ,10 ,9 ,99 ,60 ,45
9 | 9 18 27 36 45 54 63 72 81
Riley ; Frank , fr498 ,4 ,8 ,72 ,95 ,63
With LIMIT = 13: Smith ; Sally , ss324 ,5 ,10 ,100 ,75 ,80
> python Mu l t i p l i c a t i o n T a b l e . py
Multi plicatio n Table Suppose you needed to process such a file. There’s an easy way to
| 1 2 3 4 5 6 7 8 9 10 11 12
------------------------------------------------------ extract that data (the Python string split method), which we’ll
1 | 1 2 3 4 5 6 7 8 9 10 11 12 cover soon.
2 | 2 4 6 8 10 12 14 16 18 20 22 24
... But suppose you needed to write your own functions to extract the
12 | 12 24 36 48 60 72 84 96 108 120 132 144
data from a line.

CS303E Slideset 8: 39 More on Strings CS303E Slideset 8: 40 More on Strings


String Example: Line of csv Data String Example: Line of csv Data

Later we’ll explain how to process files. For now, let’s process a
line.
>>> from FieldToComma2 import *
In file FieldToComma2.py: >>> line = " abc , def ,ghi , jkl "
def SplitOnComma ( str ) : >>> first , rest = SplitOnComma ( line )
""" Given a string possibly containing a comma , >>> first
return the initial string ( before the comma ) and ’ abc ’
the string after the comma . If there is no comma , >>> rest
return the string and the empty string . """ ’ def ,ghi , jkl ’
if ( ’ , ’ in str ) : >>> first , rest = SplitOnComma ( rest )
index = str . find ( " ," ) >>> first
# Note : returns a pair of values ’ def ’
return str [: index ] , str [ index +1:] >>> rest
else : ’ghi , jkl ’
return str , " "

Notice that this returns a pair of values. How would you split on
something other than a comma?

CS303E Slideset 8: 41 More on Strings CS303E Slideset 8: 42 More on Strings

String Example

def SplitFields ( line ) :


""" Iterate through a csv line to extract and print
the values , stripped of extra whitespace . """
rest = line . strip ()
i = 1
while ( ’ , ’ in rest ) :
next , rest = SplitOnComma ( rest )
print ( " Field " , i , " : " , next . strip () , sep = " " )
i += 1
print ( " Field " , i , " : " , rest . strip () , sep = "")

>>> from FieldToComma2 import *


>>> csvLine = " xyz , 123 ,a , 12 , abc "
>>> SplitFields ( csvLine )
Field1 : xyz
Field2 : 123
Field3 : a Next stop: Lists.
Field4 : 12
Field5 : abc

CS303E Slideset 8: 43 More on Strings CS303E Slideset 8: 44 More on Strings

You might also like