How to convert categorical data to binary data in Python?
Last Updated :
17 Jan, 2022
Categorical Data is data that corresponds to the Categorical Variable. A Categorical Variable is a variable that takes fixed, a limited set of possible values. For example Gender, Blood group, a person having country residential or not, etc.
Characteristics of Categorical Data :
- This is mostly used in Statistics.
- Numerical Operation like Addition, Subtraction etc. on this type of Data is not possible.
- All the values of Categorical Data are in Categories.
- It usually uses the Array Data Structure.
Example :
Categorical Data
A Binary Data is a Data which uses two possible states or values i.e. 0 and 1.Binary data is mostly used in various fields like in Computer Science we use it as under name Bit(Binary Digit), in Digital Electronic and mathematics we use it as under name Truth Values, and we use name Binary Variable in Statistics.
Characteristics :
- The (0 and 1) also referred to as (true and false), (success and failure), (yes and no) etc.
- Binary Data is a discrete Data and also used in statistics.
Example :
Binary DataConversion of Categorical Data into Binary Data
Our task is to convert Categorical data into Binary Data as shown below in python :
Step-by-step Approach:
Step 1) In order to convert Categorical Data into Binary Data we use some function which is available in Pandas Framework. That's why Pandas framework is imported
Python3
# import required module
import pandas as pd
Step2) After that a list is created and data is entered as shown below.
Python3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
Step 3) After that Dataframe is created using pd.DataFrame() and here we add extra line i.e. print(data_frame) in order to show the Categorical Data Output as shown below:
Python3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)
Output:
Categorical Data
Step 4) Till step 3 we get Categorical Data now we will convert it into Binary Data. So for that, we have to the inbuilt function of Pandas i.e. get_dummies() as shown:
Here we use get_dummies() for only Gender column because here we want to convert Categorical Data to Binary data only for Gender Column.
Python3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
print(data_frame)
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
print(df_one)
output of step 4
Here we get output in binary code for Gender Column only. Here we have two options to use it wisely:
- Add above output to Dataframe -> Remove Gender Column -> Remove Female column(if we want Male =1 and Female =0) -> Rename Male = Gender -> Show Output of Conversion.
- Add above output to Dataframe -> Remove Gender Column -> Remove Male column( if we want Male =0 and Female =1) -> Rename Female = Gender -> Show Output of Conversion.
In the below program we used the first option and Write code accordingly as shown below:
Python3
# import required modules
import pandas as pd
# assign data
data = [["Jagroop", "Male"], ["Praveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# display categorical output
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
# print(data_frame)
# converting to binary data
df_one = pd.get_dummies(data_frame["Gender"])
# print(df_one)
# display result
df_two = pd.concat((df_one, data_frame), axis=1)
df_two = df_two.drop(["Gender"], axis=1)
df_two = df_two.drop(["Male"], axis=1)
result = df_two.rename(columns={"Female": "Gender"})
print(result)
Output:
Output
Below is the complete program based on the above approach:
Python3
# Pandas is imported in order to use various inbuilt
# Functions available in Pandas framework
import pandas as pd
# Data is initialized here
data = [["Jagroop", "Male"], ["Parveen", "Male"],
["Harjot", "Female"], ["Pooja", "Female"],
["Mohit", "Male"]]
# Data frame is created under column name Name and Gender
data_frame = pd.DataFrame(data, columns=["Name", "Gender"])
# Data of Gender is converted into Binary Data
df_one = pd.get_dummies(data_frame["Gender"])
# Binary Data is Concatenated into Dataframe
df_two = pd.concat((df_one, data_frame), axis=1)
# Gendercolumn is dropped
df_two = df_two.drop(["Gender"], axis=1)
# We want Male =0 and Female =1 So we drop Male column here
df_two = df_two.drop(["Male"], axis=1)
# Rename the Column
result = df_two.rename(columns={"Female": "Gender"})
# Print the Result
print(result)
Output:
Output
Similar Reads
How to convert categorical string data into numeric in Python?
The datasets have both numerical and categorical features. Categorical features refer to string data types and can be easily understood by human beings. However, machines cannot interpret the categorical data directly. Therefore, the categorical data must be converted into numerical data for further
4 min read
How to convert Categorical features to Numerical Features in Python?
It's difficult to create machine learning models that can't have features that have categorical values, such models cannot function. categorical variables have string-type values. thus we have to convert string values to numbers. This can be accomplished by creating new features based on the categor
2 min read
How to Convert Bytes to Int in Python?
Converting bytes to integers in Python involves interpreting a sequence of byte data as a numerical value. For example, if you have the byte sequence b'\x00\x01', it can be converted to the integer 1.Using int.from_bytes()int.from_bytes() method is used to convert a byte object into an integer. It a
3 min read
How to Convert Int to Bytes in Python?
The task of converting an integer to bytes in Python involves representing a numerical value in its binary form for storage, transmission, or processing. For example, the integer 5 can be converted into bytes, resulting in a binary representation like b'\x00\x05' or b'\x05', depending on the chosen
2 min read
How To Convert Data Types in Python 3?
Type Conversion is also known as typecasting, is an important feature in Python that allows developers to convert a variable of one type into another. In Python 3, type conversion can be done both explicitly (manual conversion) and implicitly (automatic conversion by Python).Table of ContentTypes of
4 min read
How to Convert Bytes to String in Python ?
We are given data in bytes format and our task is to convert it into a readable string. This is common when dealing with files, network responses, or binary data. For example, if the input is b'hello', the output will be 'hello'.This article covers different ways to convert bytes into strings in Pyt
2 min read
How to Convert a Dataframe Column to Numpy Array
NumPy and Pandas are two powerful libraries in the Python ecosystem for data manipulation and analysis. Converting a DataFrame column to a NumPy array is a common operation when you need to perform array-based operations on the data. In this section, we will explore various methods to achieve this t
2 min read
How to Determine Column to be Quantitative or Categorical Data in R?
In data analysis and machine learning, correctly identifying whether a column in your dataset is quantitative (numerical) or categorical is crucial. This classification affects how you preprocess the data, apply statistical tests, and build models. This article will guide you through methods to dete
3 min read
How to Convert to Best Data Types Automatically in Pandas?
Let's learn how to automatically convert columns to the best data types in a Pandas DataFrame using the convert_dtypes() method.Convert Data Type of a Pandas Series using convert_dtypes() FunctionTo convert the data type of a pandas series, simply use the following syntax: Syntax: series_name.conver
2 min read
How To Convert Sklearn Dataset To Pandas Dataframe In Python
In this article, we look at how to convert sklearn dataset to a pandas dataframe in Python. Sklearn and pandas are python libraries that are used widely for data science and machine learning operations. Pandas is majorly focused on data processing, manipulation, cleaning, and visualization whereas s
3 min read