
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Convert to Best Data Types Automatically in Pandas
Pandas is a popular data manipulation library in Python, used for cleaning and transforming data. It provides various functionalities for converting data types, such as the astype() method. However, manually converting data types can be time?consuming and prone to errors.
To address this, Pandas introduced a new feature in version 1.0 called convert_dtypes(), which allows automatic conversion of columns to their best?suited data types based on the data in the column. This feature eliminates the need for manual type conversion and ensures that the data is appropriately formatted.
Converting the Datatype of a Pandas Series
Consider the code shown below in which we will be converting the datatype of a Pandas series.
Example
import pandas as pd # Create a Series with mixed data types data = pd.Series(['1', '2', '3.1', '4.0', '5']) # Print the data types of the Series print("Original data types:") print(data.dtypes) # Convert the Series to the best data type automatically data = pd.to_numeric(data, errors='coerce') # Print the data types of the Series after conversion print("\nNew data types:") print(data.dtypes) # Print the updated Series print("\nUpdated Series:") print(data)
Explanation
Import the Pandas library using the import statement.
Create a Pandas Series named data with mixed data types, including integers and strings.
Print the original data types of the Series using the dtypes attribute.
Use the pd.to_numeric() method to automatically convert the Series to the best data type.
Pass the errors parameter with the value 'coerce' to force any invalid values to be converted to NaN.
Print the new data types of the Series using the dtypes attribute.
Print the updated Series.
To run the above code, we need to run the command shown below.
Command
python3 main.py
Output
Original data types: object New data types: float64 Updated Series: 0 1.0 1 2.0 2 3.1 3 4.0 4 5.0 dtype: float64
Converting the datatype of a Pandas DataFrame
Consider the code shown below
Example
import pandas as pd # create a sample dataframe with mixed data types data = {'name': ['John', 'Marry', 'Peter', 'Jane', 'Paul'], 'age': [25, 30, 40, 35, 27], 'gender': ['Male', 'Female', 'Male', 'Female', 'Male'], 'income': ['$500', '$1000', '$1200', '$800', '$600']} df = pd.DataFrame(data) # print the original data types of the dataframe print("Original data types:\n", df.dtypes) # convert 'age' column to float df['age'] = df['age'].astype(float) # convert 'income' column to integer by removing the dollar sign df['income'] = df['income'].str.replace('$', '').astype(int) # print the new data types of the dataframe print("\nNew data types:\n", df.dtypes) print("\nDataFrame after conversion:\n", df)
Explanation
First, we import the necessary libraries: Pandas.
We create a sample DataFrame with mixed data types including object, int64, and string values.
We print the original data types of the DataFrame using the dtypes attribute.
We convert the 'age' column to float using the astype() method, which converts the column data type to the specified type.
We convert the 'income' column to integer by removing the dollar sign using the str.replace() method and then converting the string to integer using the astype() method.
We print the new data types of the DataFrame using the dtypes attribute to confirm the data type conversion.
Finally, we print the entire DataFrame to see the converted data types.
Note: The astype() method is used for converting a Series to a specified data type while the astype() method of DataFrame is used for converting the data type of multiple columns.
Output
Original data types: name object age int64 gender object income object dtype: object New data types: name object age float64 gender object income int64 dtype: object DataFrame after conversion: name age gender income 0 John 25.0 Male 500 1 Marry 30.0 Female 1000 2 Peter 40.0 Male 1200 3 Jane 35.0 Female 800 4 Paul 27.0 Male 600
Conclusion
In conclusion, converting data types is an essential task in data analysis and manipulation. Pandas provides us with various methods to convert data types, such as specifying the data type while loading the data, using the astype() method to convert series or dataframes, and using the infer_objects() method to automatically detect the best data type for each column.
It is essential to choose the appropriate data type for each column to optimise memory usage and improve data analysis performance.