How to convert categorical string data into numeric in Python?
Last Updated :
06 Apr, 2023
The datasets have both numerical and categorical features. Categorical features refer to string data types and can be easily understood by human beings. However, machines cannot interpret the categorical data directly. Therefore, the categorical data must be converted into numerical data for further processing.
There are many ways to convert categorical data into numerical data. Here in this article, we’ll be discussing the two most used methods namely :
- Dummy Variable Encoding
- Label Encoding
In both the Methods we are using the same data, the link to the dataset is here
Method 1: Dummy Variable Encoding
We will be using pandas.get_dummies function to convert the categorical string data into numeric.
Syntax:
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
Parameters :
- data : Pandas Series, or DataFrame
- prefix : str, list of str, or dict of str, default None. String to append DataFrame column names
- prefix_sep : str, default ‘_’. If appending prefix, separator/delimiter to use.
- dummy_na : bool, default False. Add a column to indicate NaNs, if False NaNs are ignored.
- columns : list-like, default None. Column names in the DataFrame to be encoded.
- sparse : bool, default False. Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).
- drop_first : bool, default False. Whether to get k-1 dummies out of k categorical levels by removing the first level.
- dtype : dtype, default np.uint8. It specifies the data type for new columns.
Returns : DataFrame
Stepwise Implementation
Step 1: Importing Libraries
Python3
# importing pandas as pd
import pandas as pd
Step 2: Importing Data
Python3
# importing data using .read_csv() function
df = pd.read_csv('data.csv')
# printing DataFrame
df
Output:

Step 3: Converting Categorical Data Columns to Numerical.
We will convert the column 'Purchased' from categorical to numerical data type.
Python3
# using .get_dummies function to convert
# the categorical datatype to numerical
# and storing the returned dataFrame
# in a new variable df1
df1 = pd.get_dummies(df['Purchased'])
# using pd.concat to concatenate the dataframes
# df and df1 and storing the concatenated
# dataFrame in df.
df = pd.concat([df, df1], axis=1).reindex(df.index)
# removing the column 'Purchased' from df
# as it is of no use now.
df.drop('Purchased', axis=1, inplace=True)
# printing df
df
Output:

Method 2: Label Encoding
We will be using .LabelEncoder() from sklearn library to convert categorical data to numerical data. We will use function fit_transform() in the process.
Syntax :
fit_transform(y)
Parameters :
- y : array-like of shape (n_samples). Target Values.
Returns : array-like of shape (n_samples) .Encoded labels.
Stepwise Implementation
Step 1: Importing Libraries
Python3
# importing pandas as pd
import pandas as pd
Step 2 : Importing Data
Python3
#importing data using .read_csv() function
df = pd.read_csv('data.csv')
#printing DataFrame
df
Output:

Step 3 : Converting Categorical Data Columns to Numerical.
We will convert the column 'Purchased' from categorical to numerical data type.
Python3
# Importing LabelEncoder from Sklearn
# library from preprocessing Module.
from sklearn.preprocessing import LabelEncoder
# Creating a instance of label Encoder.
le = LabelEncoder()
# Using .fit_transform function to fit label
# encoder and return encoded label
label = le.fit_transform(df['Purchased'])
# printing label
label
Output:
array([0, 1, 0, 0, 1, 1, 0, 1, 0, 1])
Time Complexity: O(n log n) to O(n^2) because it involves sorting and finding unique values in the input data. Here, n is the number of elements in the df['Purchased'] array.
Auxiliary Space: O(k) where k is the number of unique labels in the df['Purchased'] array.
Step 4: Appending The Label Array to our DataFrame
Python3
# removing the column 'Purchased' from df
# as it is of no use now.
df.drop("Purchased", axis=1, inplace=True)
# Appending the array to our dataFrame
# with column name 'Purchased'
df["Purchased"] = label
# printing Dataframe
df
Output:

Similar Reads
How to convert Categorical features to Numerical Features in Python? It's difficult to create machine learning models that can't have features that have categorical values, such models cannot function. categorical variables have string-type values. thus we have to convert string values to numbers. This can be accomplished by creating new features based on the categor
2 min read
How to convert categorical data to binary data in Python? Categorical Data is data that corresponds to the Categorical Variable. A Categorical Variable is a variable that takes fixed, a limited set of possible values. For example Gender, Blood group, a person having country residential or not, etc. Characteristics of Categorical Data : This is mostly used
4 min read
How to Convert Categorical Variable to Numeric in Pandas? Converting categorical variables to numeric is essential for data preprocessing, especially in machine learning. Most algorithms require numerical input, and this transformation ensures compatibility, improves model performance, and supports effective feature engineering. Letâs explore the different
3 min read
How to convert string to integer in Python? In Python, a string can be converted into an integer using the following methods : Method 1: Using built-in int() function: If your string contains a decimal integer and you wish to convert it into an int, in that case, pass your string to int() function and it will convert your string into an equiv
3 min read
How to Convert String to Integer in Pandas DataFrame? Let's see methods to convert string to an integer in Pandas DataFrame: Method 1: Use of Series.astype() method. Syntax: Series.astype(dtype, copy=True, errors=âraiseâ) Parameters: This method will take following parameters: dtype: Data type to convert the series into. (for example str, float, int).c
3 min read
Cannot Convert String To Float in Python Python, a versatile and powerful programming language, is widely used for data manipulation and analysis. However, developers often encounter challenges, one of which is the "Cannot Convert String To Float" error. This error occurs when attempting to convert a string to a float, but the string's con
3 min read
How to Convert Bytes to String in Python ? We are given data in bytes format and our task is to convert it into a readable string. This is common when dealing with files, network responses, or binary data. For example, if the input is b'hello', the output will be 'hello'.This article covers different ways to convert bytes into strings in Pyt
2 min read
Convert String to Int in Python In Python, converting a string to an integer is important for performing mathematical operations, processing user input and efficiently handling data. This article will explore different ways to perform this conversion, including error handling and other method to validate input string during conver
3 min read
How to Convert Integers to Strings in Pandas DataFrame? In this article, we'll look at different methods to convert an integer into a string in a Pandas dataframe. In Pandas, there are different functions that we can use to achieve this task : map(str)astype(str)apply(str)applymap(str) Example 1 : In this example, we'll convert each value of a column of
3 min read