Pandas - get_dummies() method Last Updated : 03 Dec, 2024 Comments Improve Suggest changes Like Article Like Report In Pandas, the get_dummies() function converts categorical variables into dummy/indicator variables (known as one-hot encoding). This method is especially useful when preparing data for machine learning algorithms that require numeric input.Syntax: pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, drop_first=False, dtype=None)The function returns a DataFrame where each unique category in the original data is converted into a separate column, and the values are represented as True (for presence) or False (for absence).Encoding a Pandas DataFrame Let's look at an example of how to use the get_dummies() method to perform one-hot encoding. Python import pandas as pd data = { 'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red'], 'Size': ['Small', 'Large', 'Medium', 'Small', 'Large'] } df = pd.DataFrame(data) print('Original DataFrame') display(df) # Perform one-hot encoding df_encoded = pd.get_dummies(df) print('\n DataFrame after performing One-hot Encoding') display(df_encoded) Output: Sample DataFrameDataFrame after performing One-Hot EncodingIn the output, each unique category in the Color and Size columns has been transformed into a separate binary (True or False) column. The new columns indicate whether the respective category is present in each row.To get, the output as 0 and 1, instead of True and False, you can set the data type (dtype) as 'float' or 'int'. Python # Perform one-hot encoding df_encoded = pd.get_dummies(df, dtype = int) print('\n DataFrame after performing One-hot Encoding') display(df_encoded) Output: Pandas DataFrame after performing One-Hot Encoding (0s and 1s)Encoding a Pandas Series Python import pandas as pd # Series with days of the week days = pd.Series(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Monday']) print(pd.get_dummies(days, dtype='int')) Output Friday Monday Thursday Tuesday Wednesday 0 0 1 0 0 0 1 0 0 0 1 0 2 0 0 0 0 1 3 ...In this example, each unique day of the week is transformed into a dummy variable, where a 1 indicates the presence of that day.Converting NaN Values into a Dummy VariableThe dummy_na=True option can be used when dealing with missing values. It creates a separate column indicating whether the value is missing or not. Python import pandas as pd import numpy as np # List with color categories and NaN colors = ['Red', 'Blue', 'Green', np.nan, 'Red', 'Blue'] print(pd.get_dummies(colors, dummy_na=True, dtype='int')) Output Blue Green Red NaN 0 0 0 1 0 1 1 0 0 0 2 0 1 0 0 3 0 0 0 1 4 0 0 1 0 5 1 0 0 0 The dummy_na=True parameter adds a column for missing values (NaN), indicating where the NaN values were originally present. Comment More infoAdvertise with us Next Article Pandas - get_dummies() method romy421kumari Follow Improve Article Tags : Python Pandas AI-ML-DS Python-pandas Pandas-DataFrame-Methods +1 More Practice Tags : python Similar Reads Pandas DataFrame itertuples() Method itertuples() is a method that is used to iterate over the rows and return the values along with attributes in tuple format. It returns each row as a lightweight namedtuple, which is faster and more memory-efficient than other row iteration methods like iterrows(). Let us consider one sample example. 7 min read Python | Pandas Index.get_duplicates() Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.get_duplicates() function extract duplicated index elements. This functio 2 min read Python | Pandas Series.str.get_dummies() Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas str.get_dummies() is used to separate each string in the caller series at the p 3 min read Pandas DataFrame duplicated() Method | Pandas Method Pandas is widely used library in Python used for tasks like cleaning, analyzing and transforming data. One important part of cleaning data is identifying and handling duplicate rows which can lead to incorrect results if left unchecked.The duplicated() method in Pandas helps us to find these duplica 2 min read Pandas DataFrame.columns In Pandas, DataFrame.columns attribute returns the column names of a DataFrame. It gives access to the column labels, returning an Index object with the column labels that may be used for viewing, modifying, or creating new column labels for a DataFrame.Note: This attribute doesn't require any param 2 min read Get Size of the Pandas DataFrame In this article, we will discuss how to get the size of the Pandas Dataframe using Python. Method 1 : Using df.size This will return the size of dataframe  i.e. rows*columns Syntax: dataframe.size where, dataframe is the input dataframe Example: Python code to create a student dataframe and display 2 min read Python | Pandas dataframe.get() Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.get() function is used to get item from object for given key. The key 2 min read Python | Pandas DataFrame.empty Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure o 2 min read Pandas Introduction Pandas is open-source Python library which is used for data manipulation and analysis. It consist of data structures and functions to perform efficient operations on data. It is well-suited for working with tabular data such as spreadsheets or SQL tables. It is used in data science because it works 3 min read Get first N records in Pandas DataFrame When working with large datasets in Python using the Pandas library, it is often necessary to extract a specific number of records from a column to analyze or process the data, such as the first 10 values from a column. For instance, if you have a DataFrame df with column A, you can quickly get firs 5 min read Like