Basic Operations
Basic Operations
operations listed earlier in Table 7-1. Strings can be concatenated using the + operator and repeated
using the * operator: % python >>> len('abc') # Length: number of items 3 >>> 'abc' + 'def' #
Concatenation: a new string 'abcdef' >>> 'Ni!' * 4 # Repetition: like "Ni!" + "Ni!" + ... 'Ni!Ni!Ni!Ni!'
Formally, adding two string objects creates a new string object, with the contents of its operands joined.
Repetition is like adding a string to itself a number of times. In both cases, Python lets you create
arbitrarily sized strings; there’s no need to predeclare anything in Python, including the sizes of data
structures.‡ The len built-in function returns the length of a string (or any other object with a length).
Repetition may seem a bit obscure at first, but it comes in handy in a surprising number of contexts. For
example, to print a line of 80 dashes, you can count up to 80, or let Python count for you: >>>
print('------- ...more... ---') # 80 dashes, the hard way >>> print('-' * 80) # 80 dashes, the easy way Notice
that operator overloading is at work here already: we’re using the same + and * operators that perform
addition and multiplication when using numbers. Python does the correct operation because it knows
the types of the objects being added and multiplied. But be careful: the rules aren’t quite as liberal as
you might expect. For instance, Python doesn’t allow you to mix numbers and strings in + expressions:
'abc'+9 raises an error instead of automatically converting 9 to a string. As shown in the last row in Table
7-1, you can also iterate over strings in loops using for statements and test membership for both
characters and substrings with the in expression operator, which is essentially a search. For substrings,
in is much like the str.find() method covered later in this chapter, but it returns a Boolean result instead
of the substring’s position: >>> myjob = "hacker" >>> for c in myjob: print(c, end=' ') # Step through
items ... ‡ Unlike with C character arrays, you don’t need to allocate or manage storage arrays when
using Python strings; you can simply create string objects as needed and let Python manage the
underlying memory space. As discussed in Chapter 6, Python reclaims unused objects’ memory space
automatically, using a referencecount garbage-collection strategy. Each object keeps track of the
number of names, data structures, etc., that reference it; when the count reaches zero, Python frees the
object’s space. This scheme means Python doesn’t have to stop and scan all the memory to find unused
space to free (an additional garbage component also collects cyclic objects). 164 | Chapter 7: Strings h a
c k e r >>> "k" in myjob # Found True >>> "z" in myjob # Not found False >>> 'spam' in 'abcspamdef' #
Substring search, no position returned True The for loop assigns a variable to successive items in a
sequence (here, a string) and executes one or more statements for each item. In effect, the variable c
becomes a cursor stepping across the string here. We will discuss iteration tools like these and others
listed in Table 7-1 in more detail later in this book (especially in Chapters 14 and 20). Indexing and Slicing
Because strings are defined as ordered collections of characters, we can access their components by
position. In Python, characters in a string are fetched by indexing— providing the numeric offset of the
desired component in square brackets after the string. You get back the one-character string at the
specified position. As in the C language, Python offsets start at 0 and end at one less than the length of
the string. Unlike C, however, Python also lets you fetch items from sequences such as strings using
negative offsets. Technically, a negative offset is added to the length of a string to derive a positive
offset. You can also think of negative offsets as counting backward from the end. The following
interaction demonstrates: >>> S = 'spam' >>> S[0], S[−2] # Indexing from front or end ('s', 'a') >>> S[1:3],
S[1:], S[:−1] # Slicing: extract a section ('pa', 'pam', 'spa') The first line defines a four-character string and
assigns it the name S. The next line indexes it in two ways: S[0] fetches the item at offset 0 from the left
(the one-character string 's'), and S[−2] gets the item at offset 2 back from the end (or equivalently, at
offset (4 + (–2)) from the front). Offsets and slices map to cells as shown in Figure 7-1. § The last line in
the preceding example demonstrates slicing, a generalized form of indexing that returns an entire
section, not a single item. Probably the best way to think of slicing is that it is a type of parsing (analyzing
structure), especially when applied to strings—it allows us to extract an entire section (substring) in a
single step. Slices can be used to extract columns of data, chop off leading and trailing text, and more. In
fact, we’ll explore slicing in the context of text parsing later in this chapter. The basics of slicing are
straightforward. When you index a sequence object such as a string on a pair of offsets separated by a
colon, Python returns a new object containing § More mathematically minded readers (and students in
my classes) sometimes detect a small asymmetry here: the leftmost item is at offset 0, but the rightmost
is at offset –1. Alas, there is no such thing as a distinct –0 value in Python. Strings in Action | 165 the
contiguous section identified by the offset pair. The left offset is taken to be the lower bound (inclusive),
and the right is the upper bound (noninclusive). That is, Python fetches all items from the lower bound
up to but not including the upper bound, and returns a new object containing the fetched items. If
omitted, the left and right bounds default to 0 and the length of the object you are slicing, respectively.
For instance, in the example we just saw, S[1:3] extracts the items at offsets 1 and 2: it grabs the second
and third items, and stops before the fourth item at offset 3. Next, S[1:] gets all items beyond the first—
the upper bound, which is not specified, defaults to the length of the string. Finally, S[:−1] fetches all but
the last item—the lower bound defaults to 0, and −1 refers to the last item, noninclusive. This may seem
confusing at first glance, but indexing and slicing are simple and powerful tools to use, once you get the
knack. Remember, if you’re unsure about the effects of a slice, try it out interactively. In the next
chapter, you’ll see that it’s even possible to change an entire section of another object in one step by
assigning to a slice (though not for immutables like strings). Here’s a summary of the details for
reference: • Indexing (S[i]) fetches components at offsets: — The first item is at offset 0. — Negative
indexes mean to count backward from the end or right. —S[0] fetches the first item. —S[−2] fetches the
second item from the end (like S[len(S)−2]). • Slicing (S[i:j]) extracts contiguous sections of sequences: —
The upper bound is noninclusive. — Slice boundaries default to 0 and the sequence length, if omitted. —
S[1:3] fetches items at offsets 1 up to but not including 3. —S[1:] fetches items at offset 1 through the
end (the sequence length). Figure 7-1. Offsets and slices: positive offsets start from the left end (offset 0
is the first item), and negatives count back from the right end (offset −1 is the last item). Either kind of
offset can be used to give positions in indexing and slicing operations. 166 | Chapter 7: Strings —S[:3]
fetches items at offset 0 up to but not including 3. —S[:−1] fetches items at offset 0 up to but not
including the last item. —S[:] fetches items at offsets 0 through the end—this effectively performs a
toplevel copy of S. The last item listed here turns out to be a very common trick: it makes a full top-level
copy of a sequence object—an object with the same value, but a distinct piece of memory (you’ll find
more on copies in Chapter 9). This isn’t very useful for immutable objects like strings, but it comes in
handy for objects that may be changed in-place, such as lists. In the next chapter, you’ll see that the
syntax used to index by offset (square brackets) is used to index dictionaries by key as well; the
operations look the same but have different interpretations. Extended slicing: the third limit and slice
objects In Python 2.3 and later, slice expressions have support for an optional third index, used as a step
(sometimes called a stride). The step is added to the index of each item extracted. The full-blown form
of a slice is now X[I:J:K], which means “extract all the items in X, from offset I through J−1, by K.” The
third limit, K, defaults to 1, which is why normally all items in a slice are extracted from left to right. If
you specify an explicit value, however, you can use the third limit to skip items or to reverse their order.
For instance, X[1:10:2] will fetch every other item in X from offsets 1–9; that is, it will collect the items at
offsets 1, 3, 5, 7, and 9. As usual, the first and second limits default to 0 and the length of the sequence,
respectively, so X[::2] gets every other item from the beginning to the end of the sequence: >>> S =
'abcdefghijklmnop' >>> S[1:10:2] 'bdfhj' >>> S[::2] 'acegikmo' You can also use a negative stride. For
example, the slicing expression "hello"[::−1] returns the new string "olleh"—the first two bounds default
to 0 and the length of the sequence, as before, and a stride of −1 indicates that the slice should go from
right to left instead of the usual left to right. The effect, therefore, is to reverse the sequence: >>> S =
'hello' >>> S[::−1] 'olleh' With a negative stride, the meanings of the first two bounds are essentially
reversed. That is, the slice S[5:1:−1] fetches the items from 2 to 5, in reverse order (the result contains
items from offsets 5, 4, 3, and 2): Strings in Action | 167 >>> S = 'abcedfg' >>> S[5:1:−1] 'fdec' Skipping
and reversing like this are the most common use cases for three-limit slices, but see Python’s standard
library manual for more details (or run a few experiments interactively). We’ll revisit three-limit slices
again later in this book, in conjunction with the for loop statement. Later in the book, we’ll also learn
that slicing is equivalent to indexing with a slice object, a finding of importance to class writers seeking
to support both operations: >>> 'spam'[1:3] # Slicing syntax 'pa' >>> 'spam'[slice(1, 3)] # Slice objects 'pa'
>>> 'spam'[::-1] 'maps' >>> 'spam'[slice(None, None, −1)] 'maps' Why You Will Care: Slice