Pandas - get_dummies() method

Last Updated : 03 Dec, 2024

In Pandas, the get_dummies() function converts categorical variables into dummy/indicator variables (known as one-hot encoding). This method is especially useful when preparing data for machine learning algorithms that require numeric input.

Syntax: pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, drop_first=False, dtype=None)

The function returns a DataFrame where each unique category in the original data is converted into a separate column, and the values are represented as True (for presence) or False (for absence).

Encoding a Pandas DataFrame

Let's look at an example of how to use the get_dummies() method to perform one-hot encoding.

Python

import pandas as pd

data = {
    'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red'],
    'Size': ['Small', 'Large', 'Medium', 'Small', 'Large']
}

df = pd.DataFrame(data)
print('Original DataFrame')
display(df)

# Perform one-hot encoding
df_encoded = pd.get_dummies(df)
print('\n DataFrame after performing One-hot Encoding')
display(df_encoded)

Output:

Dataframe-after-performing-one-hot-encoding — DataFrame after performing One-Hot Encoding

In the output, each unique category in the Color and Size columns has been transformed into a separate binary (True or False) column. The new columns indicate whether the respective category is present in each row.

To get, the output as 0 and 1, instead of True and False, you can set the data type (dtype) as 'float' or 'int'.

Python

# Perform one-hot encoding
df_encoded = pd.get_dummies(df, dtype = int)
print('\n DataFrame after performing One-hot Encoding')
display(df_encoded)

Output:

Pandas-Encoded-DataFrame-with-0-and-1s — Pandas DataFrame after performing One-Hot Encoding (0s and 1s)

Encoding a Pandas Series

Python

import pandas as pd

# Series with days of the week
days = pd.Series(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Monday'])
print(pd.get_dummies(days, dtype='int'))

Output

   Friday  Monday  Thursday  Tuesday  Wednesday
0       0       1         0        0          0
1       0       0         0        1          0
2       0       0         0        0          1
3       ...

In this example, each unique day of the week is transformed into a dummy variable, where a 1 indicates the presence of that day.

Converting NaN Values into a Dummy Variable

The dummy_na=True option can be used when dealing with missing values. It creates a separate column indicating whether the value is missing or not.

Python

import pandas as pd
import numpy as np

# List with color categories and NaN
colors = ['Red', 'Blue', 'Green', np.nan, 'Red', 'Blue']
print(pd.get_dummies(colors, dummy_na=True, dtype='int'))

Output

   Blue  Green  Red  NaN
0     0      0    1    0
1     1      0    0    0
2     0      1    0    0
3     0      0    0    1
4     0      0    1    0
5     1      0    0    0

The dummy_na=True parameter adds a column for missing values (NaN), indicating where the NaN values were originally present.

Pandas DataFrame.columns

romy421kumari

Improve

Article Tags :

Pandas - get_dummies() method

Encoding a Pandas DataFrame

Encoding a Pandas Series

Converting NaN Values into a Dummy Variable

Similar Reads

Thank You!

What kind of Experience do you want to share?