How to Convert to Best Data Types Automatically in Pandas?
Last Updated :
03 Dec, 2024
Let's learn how to automatically convert columns to the best data types in a Pandas DataFrame using the convert_dtypes() method.
Convert Data Type of a Pandas Series using convert_dtypes() Function
To convert the data type of a pandas series, simply use the following syntax:
Syntax: series_name.convert_dtypes()
Let's consider the following example:
Python
import pandas as pd
# Creating a sample Series
s = pd.Series(['Geeks', 'for', 'Geeks'])
# Before using convert_dtypes()
print("Original Series:")
print(s)
# Automatically converting data types
print("\nAfter convert_dtypes:")
print(s.convert_dtypes())
OutputOriginal Series:
0 Geeks
1 for
2 Geeks
dtype: object
After convert_dtypes:
0 Geeks
1 for
2 Geeks
dtype: string
Here, the object data type is converted to the more optimized string type, making it more memory-efficient.
convert_dtypes() is a pandas function introduced in version 1.1.4 that allows automatic conversion of DataFrame and Series columns to the most appropriate data types. This function helps pandas intelligently adjust data types to optimize memory usage, reduce processing time, and enhance the performance.
Convert Data Types in a Pandas DataFrame
You can apply convert_dtypes() to Pandas DataFrame using the following syntax:
dataframe_name.convert_dtypes().dtypes
Let's consider the following example:
Python
import pandas as pd
import numpy as np
# Creating a sample DataFrame
df = pd.DataFrame({
"Roll_No.": [1, 2, 3],
"Name": ["Raj", "Ritu", "Rohan"],
"Result": ["Pass", "Fail", np.nan],
"Promoted": [True, False, np.nan],
"Marks": [90.33, 30.6, np.nan]
})
# Before using convert_dtypes()
print("Original DataFrame:")
display(df)
# Checking the data types before conversion
print("\nData Types Before Conversion:")
print(df.dtypes)
# Automatically converting data types
print("\nData Types After Conversion:")
print(df.convert_dtypes().dtypes)
Output:
Converted Data Types Using the convert_dtypes() FunctionAs shown, convert_dtypes() optimizes the column data types:
- The Name and Result columns are converted to the string type.
- The Promoted column is converted to the boolean type.
- The Roll_No. column is converted to int32 to optimize memory usage.
Creating a DataFrame with Explicit Data Types
You can also create a DataFrame with specified data types and use convert_dtypes() to further optimize the columns.
Python
import pandas as pd
import numpy as np
# Creating a DataFrame with explicit data types for each column
df = pd.DataFrame({
"Column_1": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
"Column_2": pd.Series(["Apple", "Ball", "Cat"], dtype=np.dtype("object")),
"Column_3": pd.Series([True, False, np.nan], dtype=np.dtype("object")),
"Column_4": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
"Column_5": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float"))
})
# Before using convert_dtypes()
print("Original DataFrame:")
print(df)
# Checking the data types before conversion
print("\nData Types Before Conversion:")
print(df.dtypes)
# Automatically converting data types
print("\nData Types After Conversion:")
print(df.convert_dtypes().dtypes)
Output:
Creating a DataFrame with Explicit Data Types
Similar Reads
Convert the data type of Pandas column to int In this article, we are going to see how to convert a Pandas column to int. Once a pandas.DataFrame is created using external data, systematically numeric columns are taken to as data type objects instead of int or float, creating numeric tasks not possible. We will pass any Python, Numpy, or Pandas
2 min read
How to Convert Categorical Variable to Numeric in Pandas? Converting categorical variables to numeric is essential for data preprocessing, especially in machine learning. Most algorithms require numerical input, and this transformation ensures compatibility, improves model performance, and supports effective feature engineering. Letâs explore the different
3 min read
Python | Pandas Series.astype() to convert Data type of series Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas astype() is the one of the most important methods. It is used to change data ty
2 min read
How to Check the Data Type in Pandas DataFrame? Pandas DataFrame is a Two-dimensional data structure of mutable size and heterogeneous tabular data. There are different Built-in data types available in Python. Â Two methods used to check the datatypes are pandas.DataFrame.dtypes and pandas.DataFrame.select_dtypes. Creating a Dataframe to Check Dat
2 min read
How to Convert float64 Columns to int64 in Pandas? float64 represents a floating-point number with double precision and int64 represents a 64-bit integer number. In this article, we will learn to Convert float64 Columns to int64 in Pandas using different methodsConvert float64 Columns to int64 in Pandas DataFrameTo transform a Pandas column to an in
3 min read