How to convert categorical string data into numeric in Python?
Last Updated :
06 Apr, 2023
The datasets have both numerical and categorical features. Categorical features refer to string data types and can be easily understood by human beings. However, machines cannot interpret the categorical data directly. Therefore, the categorical data must be converted into numerical data for further processing.
There are many ways to convert categorical data into numerical data. Here in this article, we’ll be discussing the two most used methods namely :
- Dummy Variable Encoding
- Label Encoding
In both the Methods we are using the same data, the link to the dataset is here
Method 1: Dummy Variable Encoding
We will be using pandas.get_dummies function to convert the categorical string data into numeric.
Syntax:
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
Parameters :
- data : Pandas Series, or DataFrame
- prefix : str, list of str, or dict of str, default None. String to append DataFrame column names
- prefix_sep : str, default ‘_’. If appending prefix, separator/delimiter to use.
- dummy_na : bool, default False. Add a column to indicate NaNs, if False NaNs are ignored.
- columns : list-like, default None. Column names in the DataFrame to be encoded.
- sparse : bool, default False. Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).
- drop_first : bool, default False. Whether to get k-1 dummies out of k categorical levels by removing the first level.
- dtype : dtype, default np.uint8. It specifies the data type for new columns.
Returns : DataFrame
Stepwise Implementation
Step 1: Importing Libraries
Python3
# importing pandas as pd
import pandas as pd
Step 2: Importing Data
Python3
# importing data using .read_csv() function
df = pd.read_csv('data.csv')
# printing DataFrame
df
Output:

Step 3: Converting Categorical Data Columns to Numerical.
We will convert the column 'Purchased' from categorical to numerical data type.
Python3
# using .get_dummies function to convert
# the categorical datatype to numerical
# and storing the returned dataFrame
# in a new variable df1
df1 = pd.get_dummies(df['Purchased'])
# using pd.concat to concatenate the dataframes
# df and df1 and storing the concatenated
# dataFrame in df.
df = pd.concat([df, df1], axis=1).reindex(df.index)
# removing the column 'Purchased' from df
# as it is of no use now.
df.drop('Purchased', axis=1, inplace=True)
# printing df
df
Output:

Method 2: Label Encoding
We will be using .LabelEncoder() from sklearn library to convert categorical data to numerical data. We will use function fit_transform() in the process.
Syntax :
fit_transform(y)
Parameters :
- y : array-like of shape (n_samples). Target Values.
Returns : array-like of shape (n_samples) .Encoded labels.
Stepwise Implementation
Step 1: Importing Libraries
Python3
# importing pandas as pd
import pandas as pd
Step 2 : Importing Data
Python3
#importing data using .read_csv() function
df = pd.read_csv('data.csv')
#printing DataFrame
df
Output:

Step 3 : Converting Categorical Data Columns to Numerical.
We will convert the column 'Purchased' from categorical to numerical data type.
Python3
# Importing LabelEncoder from Sklearn
# library from preprocessing Module.
from sklearn.preprocessing import LabelEncoder
# Creating a instance of label Encoder.
le = LabelEncoder()
# Using .fit_transform function to fit label
# encoder and return encoded label
label = le.fit_transform(df['Purchased'])
# printing label
label
Output:
array([0, 1, 0, 0, 1, 1, 0, 1, 0, 1])
Time Complexity: O(n log n) to O(n^2) because it involves sorting and finding unique values in the input data. Here, n is the number of elements in the df['Purchased'] array.
Auxiliary Space: O(k) where k is the number of unique labels in the df['Purchased'] array.
Step 4: Appending The Label Array to our DataFrame
Python3
# removing the column 'Purchased' from df
# as it is of no use now.
df.drop("Purchased", axis=1, inplace=True)
# Appending the array to our dataFrame
# with column name 'Purchased'
df["Purchased"] = label
# printing Dataframe
df
Output:

Similar Reads
Python Tutorial | Learn Python Programming Language Python Tutorial â Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly.Python is:A high-level language, used in web development, data science, automatio
10 min read
Python Interview Questions and Answers Python is the most used language in top companies such as Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify and many more because of its simplicity and powerful libraries. To crack their Online Assessment and Interview Rounds as a Python developer, we need to master important Pyth
15+ min read
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Python OOPs Concepts Object Oriented Programming is a fundamental concept in Python, empowering developers to build modular, maintainable, and scalable applications. By understanding the core OOP principles (classes, objects, inheritance, encapsulation, polymorphism, and abstraction), programmers can leverage the full p
11 min read
Python Projects - Beginner to Advanced Python is one of the most popular programming languages due to its simplicity, versatility, and supportive community. Whether youâre a beginner eager to learn the basics or an experienced programmer looking to challenge your skills, there are countless Python projects to help you grow.Hereâs a list
10 min read
Python Exercise with Practice Questions and Solutions Python Exercise for Beginner: Practice makes perfect in everything, and this is especially true when learning Python. If you're a beginner, regularly practicing Python exercises will build your confidence and sharpen your skills. To help you improve, try these Python exercises with solutions to test
9 min read
Python Programs Practice with Python program examples is always a good choice to scale up your logical understanding and programming skills and this article will provide you with the best sets of Python code examples.The below Python section contains a wide collection of Python programming examples. These Python co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Enumerate() in Python enumerate() function adds a counter to each item in a list or other iterable. It turns the iterable into something we can loop through, where each item comes with its number (starting from 0 by default). We can also turn it into a list of (number, item) pairs using list().Let's look at a simple exam
3 min read