0% found this document useful (0 votes)
15 views

Vectorized String Operations

The document provides an overview of vectorized string operations in Pandas, detailing various methods such as len(), translate(), ljust(), rjust(), center(), zfill(), strip(), rstrip(), upper(), lower(), find(), and index(). Each method is accompanied by syntax and examples demonstrating its application on a Series of city names. These operations are essential for handling and manipulating string data efficiently in data science tasks.

Uploaded by

kruthiksai34882
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Vectorized String Operations

The document provides an overview of vectorized string operations in Pandas, detailing various methods such as len(), translate(), ljust(), rjust(), center(), zfill(), strip(), rstrip(), upper(), lower(), find(), and index(). Each method is accompanied by syntax and examples demonstrating its application on a Series of city names. These operations are essential for handling and manipulating string data efficiently in data science tasks.

Uploaded by

kruthiksai34882
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

2/24/25, 11:57 AM a Data Science - Jupyter Notebook

Vectorized String Operations


Python's built-in string methods are mirrored by a Pandas vectorized string method

Handling and manipulating string data.

Pandas builds on this and provides a comprehensive set of vectorized string operations when one is working with real-world
data.

List of Pandas str methods that mirror Python string methods:

len
In [1]:  #In Pandas, the len() function returns the length (number of characters) in
#each string of a Series.

Syntax : Series.str.len()

Example: Find Length of City Names

In [1]:  import pandas as pd



# Series of city names
cities = pd.Series(['New York', 'San Francisco', 'Los Angeles', 'Delhi', 'Paris'])

# Find length of each city name
city_name_lengths = cities.str.len()

city_name_lengths

Out[1]: 0 8
1 13
2 11
3 5
4 5
dtype: int64

translate
In [3]:  #The translate() function replaces specific characters in strings based on a translation table.
#It’s useful for quick character substitutions.

Example: Replace 'a' with '@' and 'e' with '3'

localhost:8888/notebooks/a Data Science.ipynb#len 1/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [2]:  import pandas as pd



# Series of simple words
words = pd.Series(['apple', 'banana', 'grape', 'date'])

# Create translation table
translation_table = str.maketrans({'a': '@', 'e': '3'})

# Apply translation
translated_words = words.str.translate(translation_table)

translated_words

Out[2]: 0 @ppl3
1 b@n@n@
2 gr@p3
3 d@t3
dtype: object

ljust
In [5]:  #In Pandas, the ljust() function is used to left-justify strings in a Series or DataFrame column.
#It adds spaces (or a specified character) to the right side of the string until
#it reaches the desired length.

Syntax: Series.str.ljust(width, fillchar=' ')

Example:

In [3]:  # Left-justify city names to 15 characters for proper alignment


formatted_cities = cities.str.ljust(15,'*')
formatted_cities

Out[3]: 0 New York*******


1 San Francisco**
2 Los Angeles****
3 Delhi**********
4 Paris**********
dtype: object

rjust
In [7]:  # In Pandas, the ljust() function is used to left-justify strings in a Series or DataFrame column
# It adds spaces (or a specified character) to the right side of the string until
#it reaches the desired length

Syntax: Series.str.rjust(width, fillchar=' ')

Example:

localhost:8888/notebooks/a Data Science.ipynb#len 2/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [4]:  # Right-justify city names to 15 characters with dots for proper alignment
formatted_cities_left = cities.str.rjust(15, '.')

formatted_cities_left

Out[4]: 0 .......New York


1 ..San Francisco
2 ....Los Angeles
3 ..........Delhi
4 ..........Paris
dtype: object

Center
In [9]:  # The center() function centers the text in the specified width.
# It adds spaces (or a specified character) to both sides of the string equally

Syntax: Series.str.center(width, fillchar=' ')

Example:

In [5]:  # Center city names to 15 characters with dots for proper alignment
formatted_cities_center = cities.str.center(15, '.')

formatted_cities_center

Out[5]: 0 ....New York...


1 .San Francisco.
2 ..Los Angeles..
3 .....Delhi.....
4 .....Paris.....
dtype: object

In [ ]:  ​

zfill
In [11]:  #In Pandas, the zfill() function is used to pad strings with zeros on the left side
#until they reach the desired length.
#It is mostly used for numbers formatted as strings but works with any string data.

Syntax: Series.str.zfill(width)

Example:

In [6]:  # Zero-fill city names to 15 characters for alignment


formatted_cities_zfill = cities.str.zfill(15)

formatted_cities_zfill

Out[6]: 0 0000000New York


1 00San Francisco
2 0000Los Angeles
3 0000000000Delhi
4 0000000000Paris
dtype: object

localhost:8888/notebooks/a Data Science.ipynb#len 3/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

strip()
In [13]:  #In Pandas, the strip() function is used to remove leading and trailing spaces
#or specific characters from strings in a Series or DataFrame column.
#It is commonly used for cleaning data where extra spaces or unwanted characters may cause issues

Syntax: Series.str.strip([to strip])

Example:

In [7]:  # Strip leading and trailing spaces from city names


cleaned_cities = cities.str.strip()

cleaned_cities

Out[7]: 0 New York


1 San Francisco
2 Los Angeles
3 Delhi
4 Paris
dtype: object

Example 2: Removing Specific Characters

In [33]:  # Strip dots and dashes from both ends of city names
cities_chars = pd.Series(['..Delhi..', '--Hyderabad--', '..California', 'Tokyo--', '--Paris'])

cleaned_cities_chars = cities_chars.str.strip('.-')

cleaned_cities_chars

Out[33]: 0 Delhi
1 Hyderabad
2 California
3 Tokyo
4 Paris
dtype: object

rstrip
In [16]:  #In Pandas, the rstrip() function is used to remove trailing spaces or
#specific characters from the right side of strings in a Series or DataFrame column.
#It is helpful when you have unwanted characters or spaces at the end of strings.

Syntax:Series.str.rstrip([to_strip])

Example:

localhost:8888/notebooks/a Data Science.ipynb#len 4/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [8]:  import pandas as pd



# Series with trailing spaces
cities = pd.Series(['Delhi ', 'Hyderabad ', 'California ', 'Tokyo ', 'Paris '])

# Remove spaces from the right side
cleaned_cities_right = cities.str.rstrip()

cleaned_cities_right

Out[8]: 0 Delhi
1 Hyderabad
2 California
3 Tokyo
4 Paris
dtype: object

Example 2: Removing Specific Characters

In [18]:  # Series with trailing dots and dashes


cities_with_chars = pd.Series(['Delhi..', 'Hyderabad--', 'California..', 'Tokyo--', 'Paris--'])

# Remove dots and dashes from the right side
cleaned_cities_right_chars = cities_with_chars.str.rstrip('.-')

cleaned_cities_right_chars

Out[18]: 0 Delhi
1 Hyderabad
2 California
3 Tokyo
4 Paris
dtype: object

upper
In [19]:  #In Pandas, the upper() function is used to convert all characters in strings
#to uppercase in a Series or DataFrame column.
#It’s useful for standardizing text data, especially when you need consistent formatting.

Syntax: Series.str.upper()

Example:

In [9]:  # Convert all city names to uppercase


uppercase_cities = cities.str.upper()

uppercase_cities

Out[9]: 0 DELHI
1 HYDERABAD
2 CALIFORNIA
3 TOKYO
4 PARIS
dtype: object

lower

localhost:8888/notebooks/a Data Science.ipynb#len 5/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [21]:  #In Pandas, the lower() function is used to convert all characters in strings to
#lowercase in a Series or DataFrame column.
#It’s helpful for standardizing text data when case consistency is required.

Syntax : Series.str.lower()

Example:

In [10]:  import pandas as pd



# Series of city names with mixed cases
cities = pd.Series(['Delhi', 'Hyderabad', 'California', 'Tokyo', 'Paris'])

# Convert all city names to lowercase
lowercase_cities = cities.str.lower()

lowercase_cities

Out[10]: 0 delhi
1 hyderabad
2 california
3 tokyo
4 paris
dtype: object

find
In [23]:  #In Pandas, the find() function is used to locate the position of the
#first occurrence of a substring within each string in a Series.
#If the substring is not found, it returns -1.

Syntax : Series.str.find(sub, start=0, end=None)

Example:

In [11]:  import pandas as pd



# Series of city names
cities = pd.Series(['Delhi', 'Hyderabad', 'California', 'Tokyo', 'Paris'])

# Find the position of the substring 'a' in each city name
position_of_a = cities.str.find('a')

position_of_a

Out[11]: 0 -1
1 5
2 1
3 -1
4 1
dtype: int64

Example 2: Finding a Substring with a Start Position

localhost:8888/notebooks/a Data Science.ipynb#len 6/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [25]:  # Find the position of 'a' starting from index 4


position_of_a_from_4 = cities.str.find('a', start=4)

position_of_a_from_4

Out[25]: 0 -1
1 5
2 9
3 -1
4 -1
dtype: int64

index()
In [26]:  #In Pandas, the index() function is used to find the position of the
#first occurrence of a substring within each string in a Series.
#index() will raise an error if the substring is not found.

Syntax : Series.str.index(sub, start=0, end=None)

Example 1: Finding the Position of a Substring

localhost:8888/notebooks/a Data Science.ipynb#len 7/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [13]:  import pandas as pd



# Series of city names
cities = pd.Series(['Delhi', 'Hyderabad', 'California', 'Tokyo', 'Paris'])

# Find the position of the substring 'a' in each city name
position_of_a = cities.str.index('a')

position_of_a

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[13], line 7
4 cities = pd.Series(['Delhi', 'Hyderabad', 'California', 'Tokyo', 'Paris'])
6 # Find the position of the substring 'a' in each city name
----> 7 position_of_a = cities.str.index('a')
9 position_of_a

File ~\anaconda3\lib\site-packages\pandas\core\strings\accessor.py:129, in forbid_nonstring_type


s.<locals>._forbid_nonstring_types.<locals>.wrapper(self, *args, **kwargs)
124 msg = (
125 f"Cannot use .str.{func_name} with values of "
126 f"inferred dtype '{self._inferred_dtype}'."
127 )
128 raise TypeError(msg)
--> 129 return func(self, *args, **kwargs)

File ~\anaconda3\lib\site-packages\pandas\core\strings\accessor.py:2849, in StringMethods.index


(self, sub, start, end)
2846 msg = f"expected a string object, not {type(sub).__name__}"
2847 raise TypeError(msg)
-> 2849 result = self._data.array._str_index(sub, start=start, end=end)
2850 return self._wrap_result(result, returns_string=False)

File ~\anaconda3\lib\site-packages\pandas\core\strings\object_array.py:264, in ObjectStringArray


Mixin._str_index(self, sub, start, end)
262 else:
263 f = lambda x: x.index(sub, start, end)
--> 264 return self._str_map(f, dtype="int64")

File ~\anaconda3\lib\site-packages\pandas\core\strings\object_array.py:71, in ObjectStringArrayM


ixin._str_map(self, f, na_value, dtype, convert)
69 map_convert = convert and not np.all(mask)
70 try:
---> 71 result = lib.map_infer_mask(arr, f, mask.view(np.uint8), map_convert)
72 except (TypeError, AttributeError) as err:
73 # Reraise the exception if callable `f` got wrong number of args.
74 # The user may want to be warned by this, instead of getting NaN
75 p_err = (
76 r"((takes)|(missing)) (?(2)from \d+ to )?\d+ "
77 r"(?(3)required )positional arguments?"
78 )

File ~\anaconda3\lib\site-packages\pandas\_libs\lib.pyx:2876, in pandas._libs.lib.map_infer_mask


()

File ~\anaconda3\lib\site-packages\pandas\core\strings\object_array.py:263, in ObjectStringArray


Mixin._str_index.<locals>.<lambda>(x)
261 f = lambda x: x.index(sub, start, end)
262 else:
--> 263 f = lambda x: x.index(sub, start, end)
264 return self._str_map(f, dtype="int64")

ValueError: substring not found

localhost:8888/notebooks/a Data Science.ipynb#len 8/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

Example 2: Using index() Safely

In [14]:  # Apply a function to handle errors gracefully


def safe_index(s, sub):
try:
return s.index(sub)
except ValueError:
return -1

# Apply the safe function to each element
position_of_a_safe = cities.apply(lambda x: safe_index(x, 'a'))

position_of_a_safe

Out[14]: 0 -1
1 5
2 1
3 -1
4 1
dtype: int64

rindex()
In [ ]:  #In Pandas, the rindex() function is used to find the position of the
#last occurrence of a substring within each string in a Series.
#Unlike rfind(), if the substring is not found, rindex() will raise a ValueError.

Syntax : Series.str.rindex(sub, start=0, end=None)

Example: Finding the Last Occurrence of a Substring

localhost:8888/notebooks/a Data Science.ipynb#len 9/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [15]:  import pandas as pd



# Series of city names
cities = pd.Series(['Delhi', 'Hyderabad', 'California', 'Tokyo', 'Paris'])

# Find the position of the LAST occurrence of the substring 'a' in each city name
last_position_of_a = cities.str.rindex('a')

last_position_of_a

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[15], line 7
4 cities = pd.Series(['Delhi', 'Hyderabad', 'California', 'Tokyo', 'Paris'])
6 # Find the position of the LAST occurrence of the substring 'a' in each city name
----> 7 last_position_of_a = cities.str.rindex('a')
9 last_position_of_a

File ~\anaconda3\lib\site-packages\pandas\core\strings\accessor.py:129, in forbid_nonstring_type


s.<locals>._forbid_nonstring_types.<locals>.wrapper(self, *args, **kwargs)
124 msg = (
125 f"Cannot use .str.{func_name} with values of "
126 f"inferred dtype '{self._inferred_dtype}'."
127 )
128 raise TypeError(msg)
--> 129 return func(self, *args, **kwargs)

File ~\anaconda3\lib\site-packages\pandas\core\strings\accessor.py:2867, in StringMethods.rindex


(self, sub, start, end)
2864 msg = f"expected a string object, not {type(sub).__name__}"
2865 raise TypeError(msg)
-> 2867 result = self._data.array._str_rindex(sub, start=start, end=end)
2868 return self._wrap_result(result, returns_string=False)

File ~\anaconda3\lib\site-packages\pandas\core\strings\object_array.py:271, in ObjectStringArray


Mixin._str_rindex(self, sub, start, end)
269 else:
270 f = lambda x: x.rindex(sub, start, end)
--> 271 return self._str_map(f, dtype="int64")

File ~\anaconda3\lib\site-packages\pandas\core\strings\object_array.py:71, in ObjectStringArrayM


ixin._str_map(self, f, na_value, dtype, convert)
69 map_convert = convert and not np.all(mask)
70 try:
---> 71 result = lib.map_infer_mask(arr, f, mask.view(np.uint8), map_convert)
72 except (TypeError, AttributeError) as err:
73 # Reraise the exception if callable `f` got wrong number of args.
74 # The user may want to be warned by this, instead of getting NaN
75 p_err = (
76 r"((takes)|(missing)) (?(2)from \d+ to )?\d+ "
77 r"(?(3)required )positional arguments?"
78 )

File ~\anaconda3\lib\site-packages\pandas\_libs\lib.pyx:2876, in pandas._libs.lib.map_infer_mask


()

File ~\anaconda3\lib\site-packages\pandas\core\strings\object_array.py:270, in ObjectStringArray


Mixin._str_rindex.<locals>.<lambda>(x)
268 f = lambda x: x.rindex(sub, start, end)
269 else:
--> 270 f = lambda x: x.rindex(sub, start, end)
271 return self._str_map(f, dtype="int64")

ValueError: substring not found

localhost:8888/notebooks/a Data Science.ipynb#len 10/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

Solution: Handling Errors with rindex()

In [18]:  # Custom function to safely find the last index


def safe_rindex(string, substring):
try:
return string.rindex(substring)
except ValueError:
return -1 # Return -1 if substring is not found

# Apply the function to the Series
last_position_of_a_safe = cities.apply(lambda x: safe_rindex(x, 'a'))

last_position_of_a_safe

Out[18]: 0 -1
1 -1
2 -1
3 -1
4 -1
dtype: int64

capitalize
In [ ]:  #In Pandas, the capitalize() function is used to convert the first character
#of each string in a Series to uppercase and the rest to lowercase.
#It’s useful for standardizing text formatting, especially when dealing with
#names, cities, or titles.

Syntax : Series.str.capitalize()

Example: Capitalizing City Names

In [17]:  import pandas as pd



# Series of city names with mixed cases
cities = pd.Series(['delhi', 'HYDERABAD', 'cALiFornIA', 'tokyo', 'PARIS'])

# Capitalize the first letter of each city name
capitalized_cities = cities.str.capitalize()

capitalized_cities

Out[17]: 0 Delhi
1 Hyderabad
2 California
3 Tokyo
4 Paris
dtype: object

swapcase
In [ ]:  #In Pandas, the swapcase() function is used to swap the case of each character in the
#strings of a Series.
#It converts uppercase letters to lowercase and lowercase letters to uppercase.

Syntax : Series.str.swapcase()

Example: Swapping Case of City Names

localhost:8888/notebooks/a Data Science.ipynb#len 11/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [16]:  import pandas as pd



# Series of city names with mixed cases
cities = pd.Series(['Delhi', 'HYDERABAD', 'cALiFornIA', 'tokyo', 'PARIS'])

# Swap case for each city name
swapped_cities = cities.str.swapcase()

swapped_cities

Out[16]: 0 dELHI
1 hyderabad
2 CalIfORNia
3 TOKYO
4 paris
dtype: object

startswith
In [ ]:  #In Pandas, the startswith() function is used to check if each string in a
#Series starts with a specific substring.
#It returns True if the string starts with the given substring, otherwise False.

Syntax : Series.str.startswith(sub, na=False)

Example: Check if City Names Start with 'D

In [19]:  import pandas as pd



# Series of city names
cities = pd.Series(['Delhi', 'Hyderabad', 'California', 'Tokyo', 'Paris'])

# Check if city names start with 'D'
starts_with_d = cities.str.startswith('D')

starts_with_d

Out[19]: 0 True
1 False
2 False
3 False
4 False
dtype: bool

endswith
In [ ]:  #In Pandas, the endswith() function is used to check if each string in a
#Series ends with a specific substring.
#It returns True if the string ends with the given substring, otherwise False.

Syntax : Series.str.endswith(sub, na=False)

Example: Check if City Names End with 's'

localhost:8888/notebooks/a Data Science.ipynb#len 12/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [20]:  # Check if city names end with 's'


ends_with_s = cities.str.endswith('s')

ends_with_s

Out[20]: 0 False
1 False
2 False
3 False
4 True
dtype: bool

isalnum
In [ ]:  #In Pandas, the isalnum() function checks if all characters in each string of a Series
#are alphanumeric (i.e., only letters and numbers, no spaces or special characters).
#It returns True if all characters are alphanumeric, otherwise False.

Syntax : Series.str.isalnum()

Example: Check if City Names Are Alphanumeric

In [21]:  # Series with special characters


cities_with_special = pd.Series(['Delhi', 'Hyderabad123', 'California!', 'Tokyo', 'Paris'])

# Check if city names are alphanumeric
is_alphanumeric = cities_with_special.str.isalnum()

is_alphanumeric

Out[21]: 0 True
1 True
2 False
3 True
4 True
dtype: bool

# isalpha
In [ ]:  #In Pandas, the isalpha() function checks if all characters in each string of a
#Series are alphabetic (only letters, no numbers or special characters).
#It returns True if all characters are alphabetic, otherwise False.

Syntax : Series.str.isalpha()

Example: Check if City Names Are Alphabetic

localhost:8888/notebooks/a Data Science.ipynb#len 13/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [22]:  import pandas as pd



# Series with alphabetic and non-alphabetic city names
cities = pd.Series(['Delhi', 'Hyderabad123', 'California', 'Tokyo', 'Paris!'])

# Check if city names are alphabetic
is_alpha = cities.str.isalpha()

is_alpha

Out[22]: 0 True
1 False
2 True
3 True
4 False
dtype: bool

isdigit
In [ ]:  #In Pandas, the isdigit() function checks if all characters in each string of a Series
#are digits (0-9).
#It returns True if all characters are digits, otherwise False.

Syntax : Series.str.isdigit()

Example: Check if Series Contains Only Digits

In [23]:  # Series with numbers and mixed characters


numbers = pd.Series(['123', '4567', '89a', '007', ''])

# Check if strings are digits
is_digit = numbers.str.isdigit()

is_digit

Out[23]: 0 True
1 True
2 False
3 True
4 False
dtype: bool

isspace
In [ ]:  #In Pandas, the isspace() function checks if all characters in each string of a
#Series are whitespace (spaces, tabs, etc.).
#It returns True if the string is entirely whitespace, otherwise False.

Syntax : Series.str.isspace()

Example: Check if Series Contains Only Spaces

localhost:8888/notebooks/a Data Science.ipynb#len 14/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [24]:  # Series with spaces and other characters


texts = pd.Series([' ', 'Delhi', ' ', '', 'Tokyo '])

# Check if strings are only spaces
is_space = texts.str.isspace()

is_space

Out[24]: 0 True
1 False
2 True
3 False
4 False
dtype: bool

istitle
In [ ]:  #In Pandas, the istitle() function checks if each word in the string starts with
#an uppercase letter followed by lowercase letters.
#It returns True if the string is in title case, otherwise False.

Syntax : Series.str.istitle()

Example: Check if City Names Are in Title Case

In [25]:  # Series with title case and non-title case city names
cities = pd.Series(['Delhi', 'hyderabad', 'California', 'TOKYO', 'Paris France'])

# Check if city names are in title case
is_title = cities.str.istitle()

is_title

Out[25]: 0 True
1 False
2 True
3 False
4 True
dtype: bool

isupper
In [ ]:  #In Pandas, the isupper() function checks if all alphabetic characters in each
#string of a Series are uppercase.
#It returns True if all letters are uppercase, otherwise False.

Syntax : Series.str.isupper()

Example: Check if City Names Are in Uppercase

localhost:8888/notebooks/a Data Science.ipynb#len 15/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [26]:  import pandas as pd



# Series with mixed case city names
cities = pd.Series(['DELHI', 'Hyderabad', 'CALIFORNIA', 'TOKYO', 'paris'])

# Check if city names are in uppercase
is_upper = cities.str.isupper()

is_upper

Out[26]: 0 True
1 False
2 True
3 True
4 False
dtype: bool

isnumeric
In [ ]:  #In Pandas, the isnumeric() function checks if all characters in each string
#of a Series are numeric.
#It includes digits and numeric characters like fractions and superscripts.
#It returns True if all characters are numeric, otherwise False.

Syntax : Series.str.isnumeric()

Example: Check if Series Contains Numeric Characters

In [27]:  # Series with numeric and non-numeric values


numbers = pd.Series(['123', '4567', '½', '89a', '007', ''])

# Check if strings are numeric
is_numeric = numbers.str.isnumeric()

is_numeric

Out[27]: 0 True
1 True
2 True
3 False
4 True
5 False
dtype: bool

isdecimal
In [ ]:  #In Pandas, the isdecimal() function checks if all characters in each string
#of a Series are decimal digits (0-9).
#It does not consider fractions or superscripts as decimal digits.
#It returns True if all characters are decimal digits, otherwise False.

Syntax : Series.str.isdecimal()

Example: Check if Series Contains Decimal Digits

localhost:8888/notebooks/a Data Science.ipynb#len 16/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [28]:  # Series with decimal digits and other numeric characters


numbers = pd.Series(['123', '4567', '½', '89a', '007', ''])

# Check if strings are decimal digits
is_decimal = numbers.str.isdecimal()

is_decimal

Out[28]: 0 True
1 True
2 False
3 False
4 True
5 False
dtype: bool

split
In [ ]:  #In Pandas, the split() function splits each string in a Series into a list of substrings
#based on a specified separator.
#By default, it splits on whitespace.

Syntax : Series.str.split(pat=None, n=-1, expand=False)

Example: Split City Names on Whitespace

In [29]:  import pandas as pd



# Series with multi-word city names
cities = pd.Series(['New York', 'San Francisco', 'Los Angeles', 'Delhi', 'Paris'])

# Split city names on spaces
split_cities = cities.str.split()

split_cities

Out[29]: 0 [New, York]


1 [San, Francisco]
2 [Los, Angeles]
3 [Delhi]
4 [Paris]
dtype: object

rsplit
In [ ]:  #The rsplit() function works like split(), but it splits from the right (end of the string).
#Useful when you want to keep the first part intact and split the last parts

Syntax : Series.str.rsplit(pat=None, n=-1, expand=False)

Example: Right-Split City Names on Whitespace

localhost:8888/notebooks/a Data Science.ipynb#len 17/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

In [30]:  # Right-split city names, only splitting once from the right
rsplit_cities = cities.str.rsplit(n=1)

rsplit_cities

Out[30]: 0 [New, York]


1 [San, Francisco]
2 [Los, Angeles]
3 [Delhi]
4 [Paris]
dtype: object

partition
In [ ]:  #The partition() function splits each string in a Series into 3 parts:
#Before the separator
#The separator itself
#After the separator
#It only splits at the first occurrence of the separator.

Syntax : Series.str.partition(sep=' ', expand=True)

Example: Partition City Names on First Space

In [31]:  # Partition city names on the first spac


partitioned_cities = cities.str.partition(' ')

partitioned_cities
print(type(partitioned_cities))

<class 'pandas.core.frame.DataFrame'>

rpartition
In [ ]:  #The rpartition() function is like partition(), but it splits at the last occurrence
#of the separator.

Syntax : Series.str.rpartition(sep=' ', expand=True)

Example: Right-Partition City Names on Last Space

In [32]:  # Right-partition city names on the last space


rpartitioned_cities = cities.str.rpartition(' ')

rpartitioned_cities

Out[32]:
0 1 2

0 New York

1 San Francisco

2 Los Angeles

3 Delhi

4 Paris

In [ ]:  ​

localhost:8888/notebooks/a Data Science.ipynb#len 18/19


2/24/25, 11:57 AM a Data Science - Jupyter Notebook

localhost:8888/notebooks/a Data Science.ipynb#len 19/19

You might also like