Vectorized String Operations
Vectorized String Operations
Pandas builds on this and provides a comprehensive set of vectorized string operations when one is working with real-world
data.
len
In [1]: #In Pandas, the len() function returns the length (number of characters) in
#each string of a Series.
Syntax : Series.str.len()
Out[1]: 0 8
1 13
2 11
3 5
4 5
dtype: int64
translate
In [3]: #The translate() function replaces specific characters in strings based on a translation table.
#It’s useful for quick character substitutions.
Out[2]: 0 @ppl3
1 b@n@n@
2 gr@p3
3 d@t3
dtype: object
ljust
In [5]: #In Pandas, the ljust() function is used to left-justify strings in a Series or DataFrame column.
#It adds spaces (or a specified character) to the right side of the string until
#it reaches the desired length.
Example:
rjust
In [7]: # In Pandas, the ljust() function is used to left-justify strings in a Series or DataFrame column
# It adds spaces (or a specified character) to the right side of the string until
#it reaches the desired length
Example:
In [4]: # Right-justify city names to 15 characters with dots for proper alignment
formatted_cities_left = cities.str.rjust(15, '.')
formatted_cities_left
Center
In [9]: # The center() function centers the text in the specified width.
# It adds spaces (or a specified character) to both sides of the string equally
Example:
In [5]: # Center city names to 15 characters with dots for proper alignment
formatted_cities_center = cities.str.center(15, '.')
formatted_cities_center
In [ ]:
zfill
In [11]: #In Pandas, the zfill() function is used to pad strings with zeros on the left side
#until they reach the desired length.
#It is mostly used for numbers formatted as strings but works with any string data.
Syntax: Series.str.zfill(width)
Example:
strip()
In [13]: #In Pandas, the strip() function is used to remove leading and trailing spaces
#or specific characters from strings in a Series or DataFrame column.
#It is commonly used for cleaning data where extra spaces or unwanted characters may cause issues
Example:
In [33]: # Strip dots and dashes from both ends of city names
cities_chars = pd.Series(['..Delhi..', '--Hyderabad--', '..California', 'Tokyo--', '--Paris'])
cleaned_cities_chars = cities_chars.str.strip('.-')
cleaned_cities_chars
Out[33]: 0 Delhi
1 Hyderabad
2 California
3 Tokyo
4 Paris
dtype: object
rstrip
In [16]: #In Pandas, the rstrip() function is used to remove trailing spaces or
#specific characters from the right side of strings in a Series or DataFrame column.
#It is helpful when you have unwanted characters or spaces at the end of strings.
Syntax:Series.str.rstrip([to_strip])
Example:
Out[8]: 0 Delhi
1 Hyderabad
2 California
3 Tokyo
4 Paris
dtype: object
Out[18]: 0 Delhi
1 Hyderabad
2 California
3 Tokyo
4 Paris
dtype: object
upper
In [19]: #In Pandas, the upper() function is used to convert all characters in strings
#to uppercase in a Series or DataFrame column.
#It’s useful for standardizing text data, especially when you need consistent formatting.
Syntax: Series.str.upper()
Example:
Out[9]: 0 DELHI
1 HYDERABAD
2 CALIFORNIA
3 TOKYO
4 PARIS
dtype: object
lower
In [21]: #In Pandas, the lower() function is used to convert all characters in strings to
#lowercase in a Series or DataFrame column.
#It’s helpful for standardizing text data when case consistency is required.
Syntax : Series.str.lower()
Example:
Out[10]: 0 delhi
1 hyderabad
2 california
3 tokyo
4 paris
dtype: object
find
In [23]: #In Pandas, the find() function is used to locate the position of the
#first occurrence of a substring within each string in a Series.
#If the substring is not found, it returns -1.
Example:
Out[11]: 0 -1
1 5
2 1
3 -1
4 1
dtype: int64
Out[25]: 0 -1
1 5
2 9
3 -1
4 -1
dtype: int64
index()
In [26]: #In Pandas, the index() function is used to find the position of the
#first occurrence of a substring within each string in a Series.
#index() will raise an error if the substring is not found.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[13], line 7
4 cities = pd.Series(['Delhi', 'Hyderabad', 'California', 'Tokyo', 'Paris'])
6 # Find the position of the substring 'a' in each city name
----> 7 position_of_a = cities.str.index('a')
9 position_of_a
Out[14]: 0 -1
1 5
2 1
3 -1
4 1
dtype: int64
rindex()
In [ ]: #In Pandas, the rindex() function is used to find the position of the
#last occurrence of a substring within each string in a Series.
#Unlike rfind(), if the substring is not found, rindex() will raise a ValueError.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[15], line 7
4 cities = pd.Series(['Delhi', 'Hyderabad', 'California', 'Tokyo', 'Paris'])
6 # Find the position of the LAST occurrence of the substring 'a' in each city name
----> 7 last_position_of_a = cities.str.rindex('a')
9 last_position_of_a
Out[18]: 0 -1
1 -1
2 -1
3 -1
4 -1
dtype: int64
capitalize
In [ ]: #In Pandas, the capitalize() function is used to convert the first character
#of each string in a Series to uppercase and the rest to lowercase.
#It’s useful for standardizing text formatting, especially when dealing with
#names, cities, or titles.
Syntax : Series.str.capitalize()
Out[17]: 0 Delhi
1 Hyderabad
2 California
3 Tokyo
4 Paris
dtype: object
swapcase
In [ ]: #In Pandas, the swapcase() function is used to swap the case of each character in the
#strings of a Series.
#It converts uppercase letters to lowercase and lowercase letters to uppercase.
Syntax : Series.str.swapcase()
Out[16]: 0 dELHI
1 hyderabad
2 CalIfORNia
3 TOKYO
4 paris
dtype: object
startswith
In [ ]: #In Pandas, the startswith() function is used to check if each string in a
#Series starts with a specific substring.
#It returns True if the string starts with the given substring, otherwise False.
Out[19]: 0 True
1 False
2 False
3 False
4 False
dtype: bool
endswith
In [ ]: #In Pandas, the endswith() function is used to check if each string in a
#Series ends with a specific substring.
#It returns True if the string ends with the given substring, otherwise False.
Out[20]: 0 False
1 False
2 False
3 False
4 True
dtype: bool
isalnum
In [ ]: #In Pandas, the isalnum() function checks if all characters in each string of a Series
#are alphanumeric (i.e., only letters and numbers, no spaces or special characters).
#It returns True if all characters are alphanumeric, otherwise False.
Syntax : Series.str.isalnum()
Out[21]: 0 True
1 True
2 False
3 True
4 True
dtype: bool
# isalpha
In [ ]: #In Pandas, the isalpha() function checks if all characters in each string of a
#Series are alphabetic (only letters, no numbers or special characters).
#It returns True if all characters are alphabetic, otherwise False.
Syntax : Series.str.isalpha()
Out[22]: 0 True
1 False
2 True
3 True
4 False
dtype: bool
isdigit
In [ ]: #In Pandas, the isdigit() function checks if all characters in each string of a Series
#are digits (0-9).
#It returns True if all characters are digits, otherwise False.
Syntax : Series.str.isdigit()
Out[23]: 0 True
1 True
2 False
3 True
4 False
dtype: bool
isspace
In [ ]: #In Pandas, the isspace() function checks if all characters in each string of a
#Series are whitespace (spaces, tabs, etc.).
#It returns True if the string is entirely whitespace, otherwise False.
Syntax : Series.str.isspace()
Out[24]: 0 True
1 False
2 True
3 False
4 False
dtype: bool
istitle
In [ ]: #In Pandas, the istitle() function checks if each word in the string starts with
#an uppercase letter followed by lowercase letters.
#It returns True if the string is in title case, otherwise False.
Syntax : Series.str.istitle()
In [25]: # Series with title case and non-title case city names
cities = pd.Series(['Delhi', 'hyderabad', 'California', 'TOKYO', 'Paris France'])
# Check if city names are in title case
is_title = cities.str.istitle()
is_title
Out[25]: 0 True
1 False
2 True
3 False
4 True
dtype: bool
isupper
In [ ]: #In Pandas, the isupper() function checks if all alphabetic characters in each
#string of a Series are uppercase.
#It returns True if all letters are uppercase, otherwise False.
Syntax : Series.str.isupper()
Out[26]: 0 True
1 False
2 True
3 True
4 False
dtype: bool
isnumeric
In [ ]: #In Pandas, the isnumeric() function checks if all characters in each string
#of a Series are numeric.
#It includes digits and numeric characters like fractions and superscripts.
#It returns True if all characters are numeric, otherwise False.
Syntax : Series.str.isnumeric()
Out[27]: 0 True
1 True
2 True
3 False
4 True
5 False
dtype: bool
isdecimal
In [ ]: #In Pandas, the isdecimal() function checks if all characters in each string
#of a Series are decimal digits (0-9).
#It does not consider fractions or superscripts as decimal digits.
#It returns True if all characters are decimal digits, otherwise False.
Syntax : Series.str.isdecimal()
Out[28]: 0 True
1 True
2 False
3 False
4 True
5 False
dtype: bool
split
In [ ]: #In Pandas, the split() function splits each string in a Series into a list of substrings
#based on a specified separator.
#By default, it splits on whitespace.
rsplit
In [ ]: #The rsplit() function works like split(), but it splits from the right (end of the string).
#Useful when you want to keep the first part intact and split the last parts
In [30]: # Right-split city names, only splitting once from the right
rsplit_cities = cities.str.rsplit(n=1)
rsplit_cities
partition
In [ ]: #The partition() function splits each string in a Series into 3 parts:
#Before the separator
#The separator itself
#After the separator
#It only splits at the first occurrence of the separator.
<class 'pandas.core.frame.DataFrame'>
rpartition
In [ ]: #The rpartition() function is like partition(), but it splits at the last occurrence
#of the separator.
Out[32]:
0 1 2
0 New York
1 San Francisco
2 Los Angeles
3 Delhi
4 Paris
In [ ]: