Slicing Column Values in Pandas
Last Updated :
11 Jul, 2024
Slicing column values in Pandas is a crucial operation in data manipulation and analysis. Pandas, a powerful Python library, provides various methods to slice and extract specific data from DataFrames. This article will delve into the different techniques for slicing column values, highlighting their syntax, examples, and applications.
Introduction to Pandas DataFrame
A Pandas DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table and is one of the most commonly used data structures in data analysis.
To get started, let's create a simple DataFrame:
Python
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
Slicing Column Values using Indexing
1. Positional Indexing with iloc
The iloc function is used for positional indexing, which allows you to slice data based on numerical positions.
Python
# Slicing the first two rows of the 'Name' column
names = df.iloc[:2, 0]
print(names)
Output:
0 Alice
1 Bob
Name: Name, dtype: object
2. Label-based Indexing with loc
The loc function is used for label-based indexing, which allows you to slice data based on row and column labels.
Python
# Slicing the 'Name' column for the first two rows
names = df.loc[:1, 'Name']
print(names)
Output:
0 Alice
1 Bob
Name: Name, dtype: object
Slicing Column Values using String Methods
1. Accessing Substrings
You can access substrings of column values using the str accessor.
Python
# Extracting the first three characters of each name
df['Name_Short'] = df['Name'].str[:3]
print(df)
Output:
Name Age City Name_Short
0 Alice 25 New York Ali
1 Bob 30 Los Angeles Bob
2 Charlie 35 Chicago Cha
2. Using Regular Expressions
Regular expressions can be used for more complex slicing.
Python
# Extracting only the digits from the 'City' column (although in this case, there are none)
df['City_Digits'] = df['City'].str.extract('(\d+)', expand=False)
print(df)
Output:
Name Age City Name_Short City_Digits
0 Alice 25 New York Ali NaN
1 Bob 30 Los Angeles Bob NaN
2 Charlie 35 Chicago Cha NaN
Slicing Column Values in Pandas : Advanced Techniques
1. Slicing with apply and lambda
The apply function combined with a lambda function provides a flexible way to slice column values.
Python
# Extracting the first letter of each city name
df['City_First_Letter'] = df['City'].apply(lambda x: x[0])
print(df)
Output:
Name Age City Name_Short City_Digits City_First_Letter
0 Alice 25 New York Ali NaN N
1 Bob 30 Los Angeles Bob NaN L
2 Charlie 35 Chicago Cha NaN C
2. Using str.split for Complex Slicing
The str.split method splits strings based on a specified delimiter and returns a list. You can then slice these lists to extract specific parts.
Python
# Splitting the 'Name' column by the letter 'l' and taking the first part
df['Name_Split'] = df['Name'].str.split('l').str[0]
print(df)
Output:
Name Age City Name_Short City_Digits City_First_Letter \
0 Alice 25 New York Ali NaN N
1 Bob 30 Los Angeles Bob NaN L
2 Charlie 35 Chicago Cha NaN C
Name_Split
0 A
1 Bob
2 Char
Practical Examples: Slicing Columns in a Real-World Dataset
Example 1: Analyzing Titanic Passenger Data
Let's consider a dataset of Titanic passengers:
Python
import pandas as pd
# Load the Titanic dataset
url = 'https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv'
df = pd.read_csv(url)
# Display the first few rows of the dataset
print(df.head())
Output:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
1. Slicing Specific Columns:
Python
# Slice columns 'Name', 'Age', and 'Sex'
df_sliced = df.loc[:, ['Name', 'Age', 'Sex']]
print(df_sliced.head())
Output:
Name Age Sex
0 Braund, Mr. Owen Harris 22.0 male
1 Cumings, Mrs. John Bradley (Florence Briggs Th... 38.0 female
2 Heikkinen, Miss. Laina 26.0 female
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0 female
4 Allen, Mr. William Henry 35.0 male
2. Slicing Columns by Index:
Python
# Slice columns from index 1 to 4
df_sliced = df.iloc[:, 1:4]
print(df_sliced.head())
Output:
Survived Pclass Name
0 0 3 Braund, Mr. Owen Harris
1 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)
2 1 3 Heikkinen, Miss. Laina
3 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 0 3 Allen, Mr. William Henry
Example 2: Slicing Substrings in a Product Codes Dataset
Consider a dataset with product codes:
Python
import pandas as pd
# Create a DataFrame with product codes
data = {
'ProductCode': ['A12345', 'B67890', 'C54321', 'D98765'],
'Price': [100, 150, 200, 250]
}
df = pd.DataFrame(data)
print(df)
Output:
ProductCode Price
0 A12345 100
1 B67890 150
2 C54321 200
3 D98765 250
1. Extracting Product Category:
Python
# Slice the first character to get the product category
df['Category'] = df['ProductCode'].str.slice(0, 1)
print(df)
Output:
ProductCode Price Category
0 A12345 100 A
1 B67890 150 B
2 C54321 200 C
3 D98765 250 D
2. Extracting Product Number:
Python
# Slice the numeric part of the product code
df['ProductNumber'] = df['ProductCode'].str.slice(1)
print(df)
Output:
ProductCode Price Category ProductNumber
0 A12345 100 A 12345
1 B67890 150 B 67890
2 C54321 200 C 54321
3 D98765 250 D 98765
Conclusion
Slicing column values in Pandas is a fundamental skill for data manipulation and analysis. Whether you need to slice entire columns or extract substrings from column values, Pandas provides versatile methods to accomplish these tasks. By mastering these techniques, you can efficiently preprocess and analyze your data, making your data analysis workflows more effective and streamlined.
Similar Reads
Search A pandas Column For A Value
Prerequisites: pandas In this article let's discuss how to search data frame for a given specific value using pandas. Function usedwhere() -is used to check a data frame for one or more condition and return the result accordingly. By default, The rows not satisfying the condition are filled with NaN
2 min read
Split Pandas Dataframe by column value
Sometimes in order to analyze the Dataframe more accurately, we need to split it into 2 or more parts. The Pandas provide the feature to split Dataframe according to column index, row index, and column values, etc. Let' see how to Split Pandas Dataframe by column value in Python? Now, let's create
3 min read
Get the absolute values in Pandas
Let us see how to get the absolute value of an element in Python Pandas. We can perform this task by using the abs() function. The abs() function is used to get a Series/DataFrame with absolute numeric value of each element. Syntax : Series.abs() or DataFrame.abs() Parameters : None Returns : Series
2 min read
Pandas Convert Column To String Type
Pandas is a Python library widely used for data analysis and manipulation of huge datasets. One of the major applications of the Pandas library is the ability to handle and transform data. Mostly during data preprocessing, we are required to convert a column into a specific data type. In this articl
4 min read
Pandas Select Columns
Simplest way to select a specific or multiple columns in pandas dataframe is by using bracket notation, where you place the column name inside square brackets. Let's consider following example: Pythonimport pandas as pd data = {'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'], 'Age': [25, 30, 22,
3 min read
Python | Pandas Index.values
Pandas Index is an immutable ndarray implementing an ordered, sliceable set. It is the basic object which stores the axis labels for all pandas objects. Pandas Index.values attribute return an array representing the data in the given Index object. Syntax: Index.values Parameter : None Returns : an a
2 min read
How to Select Column Values to Display in Pandas Groupby
Pandas is a powerful Python library used extensively in data analysis and manipulation. One of its most versatile and widely used functions is groupby, which allows users to group data based on specific criteria and perform various operations on these groups. This article will delve into the details
5 min read
Python | Pandas Index.get_values()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.get_values() function returns the Index data as an numpy.ndarray. It retu
2 min read
How to take column-slices of DataFrame in Pandas?
In this article, we will learn how to slice a DataFrame column-wise in Python. DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns.Creating Dataframe to slice columnsPython# importing pandas import pandas as pd # Using DataFrame() method from pandas module df1 = pd.
2 min read
Dividing Values of Grouped Columns in Pandas
In Pandas, the groupby method is a powerful tool for aggregating and analyzing data based on specific criteria. When seeking divided values of two columns resulting from a groupby operation, you can use various techniques. In this article, we will explore three different methods/approaches to get th
4 min read