String Operators: The Operator
String Operators: The Operator
The + Operator
The + operator concatenates strings. It returns a string consisting of the operands joined to
Built-in String Functions
>>> s = 'foo' As you saw in the tutorial on Basic Data Types in Python, Python provides many functions
>>> t = 'bar' always available. Here are a few that work with strings:
>>> u = 'baz'
At the most basic level, computers store all information as numbers. To represent characte
Here are examples of both forms: which maps each character to its representative number.
>>> s = 'foo.' The simplest scheme in common use is called ASCII. It covers the common Latin character
working with. For these characters, ord(c) returns the ASCII value for character c:
>>> s * 4
'foo.foo.foo.foo.'
>>> ord('a')
>>> 4 * s
97
'foo.foo.foo.foo.'
>>> ord('#')
35
The multiplier operand n must be an integer. You’d think it would be required to be a posi
zero or negative, in which case the result is an empty string: ASCII is fine as far as it goes. But there are many different languages in use in the world an
appear in digital media. The full set of characters that potentially may need to be represen
chr(n) String indexing in Python is zero-based: the first character in the string has index 0, the nex
the last character will be the length of the string minus one.
Returns a character value for the given integer.
For example, a schematic diagram of the indices of the string 'foobar' would look like this
chr() does the reverse of ord(). Given a numeric value n, chr(n) returns a string representin
>>> chr(97)
'a'
>>> chr(35)
'#'
String Indices
chr() handles Unicode characters as well:
The individual characters can be accessed by index as follows:
>>> chr(8364)
'€' >>> s = 'foobar'
>>> chr(8721)
'∑' >>> s[0]
'f'
>>> s[1]
'o'
>>> s[3]
'b'
len(s)
>>> len(s)
6
Returns the length of a string.
>>> s[len(s)-1]
'r'
With len(), you can check Python string length. len(s) returns the number of characters in
str(obj)
String indices can also be specified with negative numbers, in which case indexing occurs f
Returns a string representation of an object. backward: -1 refers to the last character, -2 the second-to-last character, and so on. Here i
positive and negative indices into the string 'foobar':
Virtually any object in Python can be rendered as a string. str(obj) returns the string repre
>>> str(49.2)
'49.2'
>>> str(3+4j)
'(3+4j)'
>>> s[-7] >>> s = 'foobar'
Traceback (most recent call last): >>> t = s[:]
File "<pyshell#26>", line 1, in <module> >>> id(s)
s[-7] 59598496
IndexError: string index out of range >>> id(t)
59598496
>>> s is t
For any non-empty string s, s[len(s)-1] and s[-1] both return the last character. There isn True
empty string.
If the first index in a slice is greater than or equal to the second index, Python returns an e
String Slicing obfuscated way to generate an empty string, in case you were looking for one:
Python also allows a form of indexing syntax that extracts substrings from a string, known
>>> s[2:2]
expression of the form s[m:n] returns the portion of s starting with position m, and up to b
''
>>> s[4:2]
>>> s = 'foobar' ''
>>> s[2:5]
'oba'
Negative indices can be used with slicing as well. -1 refers to the last character, -2 the seco
simple indexing. The diagram below shows how to slice the substring 'oob' from the string
Remember: String indices are zero-based. The first character in a string has index 0. Thi negative indices:
and slicing.
Again, the second index specifies the first character that is not included in the result—the
above. That may seem slightly unintuitive, but it produces this result which makes sense: t
substring that is n - m characters in length, in this case, 5 - 2 = 3.
If you omit the first index, the slice starts at the beginning of the string. Thus, s[:m] and s[
>>> s[:4]
Here is the corresponding Python code:
'foob'
>>> s[0:4] >>> s = 'foobar'
'foob'
>>> s[-5:-2]
'oob'
Similarly, if you omit the second index as in s[n:], the slice extends from the first index thr >>> s[1:4]
nice, concise alternative to the more cumbersome s[n:len(s)]: 'oob'
>>> s[-5:-2] == s[1:4]
>>> s = 'foobar' True
>>> s[2:]
'obar'
>>> s[2:len(s)]
String Indexing with Stride
Interpolating Variables Into a String
Similarly, 1:6:2 specifies a slice starting with the second character (index 1) and ending wit In Python version 3.6, a new string formatting mechanism was introduced. This feature is f
value 2 causes every other character to be skipped: Literal, but is more usually referred to by its nickname f-string.
The formatting capability provided by f-strings is extensive and won’t be covered in full de
can check out the Real Python article Python’s F-String for String Interpolation and Format
Formatted Output coming up later in this series that digs deeper into f-strings.
One simple feature of f-strings you can start using right away is variable interpolation. You
within an f-string literal, and Python will replace the name with the corresponding value.
For example, suppose you want to display the result of an arithmetic calculation. You can d
Another String Indexing with Stride straightforward print() statement, separating numeric values and string literals by comma
The illustrative REPL code is shown here:
>>> n = 20
>>> m = 25
>>> s = 'foobar'
>>> prod = n * m
>>> print('The product of', n, 'and', m, 'is', prod)
>>> s[0:6:2] The product of 20 and 25 is 500
'foa'
>>> s[1:6:2] But this is cumbersome. To accomplish the same thing using an f-string:
'obr'
Specify either a lowercase f or uppercase F directly before the opening quote of the s
string instead of a standard string.
As with any slicing, the first and second indices can be omitted, and default to the first and
Specify any variables to be interpolated in curly braces ({}).
>>> s = '12345' * 5 Recast using an f-string, the above example looks much cleaner:
>>> s
'1234512345123451234512345'
>>> s[::5] >>> n = 20
>>> m = 25
'11111'
>>> s[4::5] >>> prod = n * m
>>> print(f'The product of {n} and {m} is {prod}')
'55555'
The product of 20 and 25 is 500
You can specify a negative stride value as well, in which case Python steps backward throu
Any of Python’s three quoting mechanisms can be used to define an f-string:
starting/first index should be greater than the ending/second index:
s.lower()
In truth, there really isn’t much need to modify strings. You can usually easily accomplish w
the original string that has the desired change in place. There are very many ways to do th
Converts alphabetic characters to lowercase.
>>> s = 'foobar'
>>> s = s.replace('b', 'x')
>>> s s.swapcase()
'fooxar'
Swaps case of alphabetic characters.
You are also familiar with functions: callable procedures that you can invoke to perform sp
Methods are similar to functions. A method is a specialized type of callable procedure that s.title()
Like a function, a method is called to perform a distinct task, but it is invoked on a specific
Converts the target string to “title case.”
object during execution.
The syntax for invoking a method on an object is as follows: s.title() returns a copy of s in which the first letter of each word is converted to uppercas
This invokes method .foo() on object obj. <args> specifies the arguments passed to the m
This method uses a fairly simple algorithm. It does not attempt to distinguish between imp
You will explore much more about defining and calling methods later in the discussion of does not handle apostrophes, possessives, or acronyms gracefully:
the goal is to present some of the more commonly used built-in methods Python support
Each method in this group supports optional <start> and <end> arguments. These are inter
>>> 'foo bar foo baz foo qux'.find('foo', 4)
the method is restricted to the portion of the target string starting at character position <s
8
including character position <end>. If <start> is specified but <end> is not, the method appli >>> 'foo bar foo baz foo qux'.find('foo', 4, 7)
from <start> through the end of the string. -1
This method is identical to .find(), except that it raises an exception if <sub> is not found r
>>> 'foo goo moo'.count('oo')
3
>>> 'foo bar foo baz foo qux'.index('grault')
Traceback (most recent call last):
The count is restricted to the number of occurrences within the substring indicated by <sta File "<pyshell#0>", line 1, in <module>
'foo bar foo baz foo qux'.index('grault')
ValueError: substring not found
>>> 'foo goo moo'.count('oo', 0, 8)
2
>>> 'foobar'.endswith('oob', 0, 4)
The search is restricted to the substring indicated by <start> and <end>, if they are specified
True
>>> 'foobar'.endswith('oob', 2, 4)
False >>> 'foo bar foo baz foo qux'.rfind('foo', 0, 14)
8
>>> 'foo bar foo baz foo qux'.rfind('foo', 10, 14)
-1
Determines whether the target string starts with a given substring. >>> '123'.isdigit()
True
>>> '123abc'.isdigit()
When you use the Python .startswith() method, s.startswith(<suffix>) returns True if s s
False
specified <suffix> and False otherwise:
>>> 'foobar'.startswith('foo')
True
>>> 'foobar'.startswith('bar') s.isidentifier()
False
Determines whether the target string is a valid Python identifier.
The comparison is restricted to the substring indicated by <start> and <end>, if they are sp
s.isidentifier() returns True if s is a valid Python identifier according to the language def
>>> 'foobar'.startswith('bar', 3)
True >>> 'foo32'.isidentifier()
>>> 'foobar'.startswith('bar', 3, 2) True
False >>> '32foo'.isidentifier()
False
>>> 'foo$32'.isidentifier()
False
Character Classification
Methods in this group classify a string based on the characters it contains. Note: .isidentifier() will return True for a string that matches a Python keyword even
valid identifier:
You can test whether a string matches a Python keyword using a function called iskeywo
s.isalnum() returns True if s is nonempty and all its characters are alphanumeric (either a le
called keyword. One possible way to do this is shown below:
>>> 'abc123'.isalnum()
>>> from keyword import iskeyword
True
>>> iskeyword('and')
>>> 'abc$123'.isalnum()
True
False
>>> ''.isalnum()
False If you really want to ensure that a string would serve as a valid Python identifier, you sh
that .isidentifier() is True and that iskeyword() is False.
See Python Modules and Packages—An Introduction to read more about Python modu
s.isalpha()
s.isupper() returns True if s is nonempty and all the alphabetic characters it contains are u
>>> 'a\tb'.isprintable()
alphabetic characters are ignored:
False
>>> 'a b'.isprintable()
True >>> 'ABC'.isupper()
>>> ''.isprintable() True
True >>> 'ABC1$D'.isupper()
>>> 'a\nb'.isprintable() True
False >>> 'Abc1$D'.isupper()
False
Note: This is one of only two .isxxxx() methods that returns True if s is an empty string
is .isascii().
String Formatting
Methods in this group modify or enhance the format of a string.
s.isspace()
Determines whether the target string consists of whitespace characters. s.center(<width>[, <fill>])
s.isspace() returns True if s is nonempty and all characters are whitespace characters, and Centers a string in a field.
The most commonly encountered whitespace characters are space ' ', tab '\t', and newli s.center(<width>) returns a string consisting of s centered in a field of width <width>. By de
space character:
>>> ' \t \n '.isspace()
True >>> 'foo'.center(10)
>>> ' a '.isspace() ' foo '
False
('\f' and '\r' are the escape sequences for the ASCII Form Feed and Carriage Return char
>>> 'foo'.center(2)
for the Unicode Four-Per-Em Space.)
'foo'
s.istitle()
s.rjust(<width>[, <fill>])
s.rstrip([<chars>])
s.lstrip() returns a copy of s with any whitespace characters removed from the left end:
>>> ' foo bar baz '.rstrip()
' foo bar baz'
>>> ' foo bar baz '.lstrip() >>> 'foo\t\nbar\t\nbaz\t\n'.rstrip()
'foo bar baz ' 'foo\t\nbar\t\nbaz'
>>> '\t\nfoo\t\nbar\t\nbaz'.lstrip()
'foo\t\nbar\t\nbaz'
If the optional <chars> argument is specified, it is a string that specifies the set of characte
If h i l i ifi d i i i h ifi h f h
As with .lstrip() and .rstrip(), the optional <chars> argument specifies the set of charact s.join(<iterable>)
>>> 'www.realpython.com'.strip('w.moc')
Concatenates strings from an iterable.
'realpython'
s.join(<iterable>) returns the string that results from concatenating the objects in <iterab
Note: When the return value of a string method is another string, as is often the case, m Note that .join() is invoked on s, the separator string. <iterable> must be a sequence of s
by chaining the calls:
Some sample code should help clarify. In the following example, the separator s is the strin
values:
>>> ' foo bar baz\t\t\t'.lstrip().rstrip()
'foo bar baz'
>>> ', '.join(['foo', 'bar', 'baz', 'qux'])
>>> ' foo bar baz\t\t\t'.strip()
'foo, bar, baz, qux'
'foo bar baz'
>>> 'www.realpython.com'.lstrip('w.moc').rstrip('w.moc') The result is a single string consisting of the list objects separated by commas.
'realpython'
>>> 'www.realpython.com'.strip('w.moc') In the next example, <iterable> is specified as a single string value. When a string value is
'realpython'
list of the string’s individual characters:
>>> list('corge')
['c', 'o', 'r', 'g', 'e']
s.zfill(<width>)
>>> ':'.join('corge')
Pads a string on the left with zeros. 'c:o:r:g:e'
s.zfill(<width>) returns a copy of s left-padded with '0' characters to the specified <width Thus, the result of ':'.join('corge') is a string consisting of each character in 'corge' sepa
This example fails because one of the objects in <iterable> is not a string:
>>> '42'.zfill(5)
'00042'
>>> '---'.join(['foo', 23, 'bar'])
Traceback (most recent call last):
If s contains a leading sign, it remains at the left edge of the result string after zeros are in File "<pyshell#0>", line 1, in <module>
'---'.join(['foo', 23, 'bar'])
>>> '+42'.zfill(8) TypeError: sequence item 1: expected str instance, int found
'+0000042'
>>> '-42'.zfill(8)
That can be remedied, though:
'-0000042'
>>> '-42'.zfill(3)
As you will soon see, many composite objects in Python can be construed as iterables, and
'-42'
strings from them.
If <sep> is not found in s, the returned tuple contains s followed by two empty strings:
>>> 'www.realpython.com'.rsplit(sep='.', maxsplit=1)
['www.realpython', 'com']
>>> 'foo.bar'.partition('@@')
('foo.bar', '', '')
The default value for <maxsplit> is -1, which means all possible splits should be performed
entirely:
Remember: Lists and tuples are covered in the next tutorial.
>>> 'www.realpython.com'.rsplit(sep='.', maxsplit=-1)
['www', 'realpython', 'com']
>>> 'www.realpython.com'.rsplit(sep='.')
s.rpartition(<sep>) ['www', 'realpython', 'com']
Without arguments, s.rsplit() splits s into substrings delimited by any sequence of white
s.splitlines([<keepends>])
list:
Breaks a string at line boundaries.
>>> 'foo bar baz qux'.rsplit()
['foo', 'bar', 'baz', 'qux']
>>> 'foo\n\tbar baz\r\fqux'.rsplit()
s.splitlines() splits s up into lines and returns them in a list. Any of the following charact
['foo', 'bar', 'baz', 'qux'] to constitute a line boundary:
If the optional <keepends> argument is specified and is truthy, then the lines boundaries are >>> b = rb'foo\xddbar'
>>> b
>>> 'foo\nbar\nbaz\nqux'.splitlines(True) b'foo\\xddbar'
['foo\n', 'bar\n', 'baz\n', 'qux'] >>> b[3]
>>> 'foo\nbar\nbaz\nqux'.splitlines(1) 92
['foo\n', 'bar\n', 'baz\n', 'qux'] >>> chr(92)
'\\'
bytes(<s>, <encoding>) converts string <s> to a bytes object, using str.encode() according t
Defining a Literal bytes Object
A bytes literal is defined in the same way as a string literal with the addition of a 'b' prefix >>> b = bytes('foo.bar', 'utf8')
>>> b
b'foo.bar'
>>> b = b'foo bar baz' >>> type(b)
>>> b <class 'bytes'>
b'foo bar baz'
>>> type(b)
<class 'bytes'>
Technical Note: In this form of the bytes() function, the <encoding> argument is require
which characters are translated to integer values. A value of "utf8" indicates Unicode Tr
As with strings, you can use any of the single, double, or triple quoting mechanisms: an encoding that can handle every possible Unicode character. UTF-8 can also be indica
or "UTF-8" for <encoding>.
>>> b'Contains embedded "double" quotes'
bytes(<iterable>)
>>> len(b)
5
Creates a bytes object from an iterable.
>>> min(b)
97
bytes(<iterable>) defines a bytes object from the sequence of integers generated by <iter >>> max(b)
that generates a sequence of integers n in the range 0 ≤ n ≤ 255: 101
>>> b.endswith(b'qux')
True
>>> b.find(b'baz')
12
The in and not in operators: Notice, however, that when these operators and methods are invoked on a bytes object, th
be bytes objects as well:
>>> b = b'abcde'
>>> b = b'foo.bar'
>>> b'cd' in b
True >>> b + '.baz'
>>> b'foo' not in b
Traceback (most recent call last):
True File "<pyshell#72>", line 1, in <module>
b + '.baz'
TypeError: can't concat bytes to str
The concatenation (+) and replication (*) operators:
>>> b + b'.baz'
b'foo.bar.baz'
>>> b = b'abcde'
>>> b.split(sep='.')
>>> b + b'fghi' Traceback (most recent call last):
b'abcdefghi' File "<pyshell#74>", line 1, in <module>
>>> b * 3 b.split(sep='.')
b'abcdeabcdeabcde' TypeError: a bytes-like object is required, not 'str'
>>> b split(sep=b' ')
>>> b[2:3] bytearray Objects
b'c'
Python supports another binary sequence type called the bytearray. bytearray objects are v
differences:
You can convert a bytes object into a list of integers with the built-in list() function:
There is no dedicated syntax built into Python for defining a bytearray literal, like the
a bytes object. A bytearray object is always created using the bytearray() built-in func
>>> list(b)
[97, 98, 99, 100, 101]
>>> ba = bytearray('foo.bar.baz', 'UTF-8')
>>> ba
Hexadecimal numbers are often used to specify binary data because two hexadecimal digi bytearray(b'foo.bar.baz')
The bytes class supports two additional methods that facilitate conversion to and from a s
>>> bytearray(6)
bytearray(b'\x00\x00\x00\x00\x00\x00')
bytes.fromhex(<s>)
>>> bytearray([100, 102, 104, 106, 108])
bytearray(b'dfhjl')
Returns a bytes object constructed from a string of hexadecimal values.
bytes.fromhex(<s>)returns the bytes object that results from converting each pair of hexad bytearray objects are mutable. You can modify the contents of a bytearray object usin
corresponding byte value. The hexadecimal digit pairs in <s> may optionally be separated
>>> ba = bytearray('foo.bar.baz', 'UTF-8')
>>> b = bytes.fromhex(' aa 68 4682cc ') >>> ba
>>> b bytearray(b'foo.bar.baz')
b'\xaahF\x82\xcc'
>>> list(b) >>> ba[5] = 0xee
[170, 104, 70, 130, 204] >>> ba
bytearray(b'foo.b\xeer.baz')
b.hex() returns the result of converting bytes object b into a string of hexadecimal digit pa
of .fromhex():