Conversion Functions in Pandas DataFrame



Pandas is one of the most potent libraries in python that provide high-performance data manipulation and analysis tools, it allows us to work with tabular data like spreadsheets, CSV, and SQL data using DataFrame.

A DataFrame is a 2-dimensional labeled data structure it represents the data in rows and columns format. Data present in each column may have different data types.

DataFrame: Integers Floats Strings Dates 0 1.0 1.300 p 2023-05-07 1 2.0 NaN y 2023-05-14 2 5.0 4.600 t 2023-05-21 3 3.0 1.020 h 2023-05-28 4 6.0 0.300 o 2023-06-04 5 NaN 0.001 n 2023-06-11

The DataFrame demonstrated above is having 6 rows and 4 columns and the data present in each row has different datatypes.

And Conversions functions are used to convert the datatype of elements present in a DataFrame object. In this article below we will discuss different type-conversion functions in Pandas DataFrame.

Input Output Scenarios

Let's see the input-output scenarios to understand how typecasting can be done by using the conversion functions.

Assuming we have a DataFrame with a few columns of different data types, and in the output, we will see a DataFrame with updated column data types.

Input DataFrame: ints strs ints2 floats 0 1 x 10.0 NaN 1 2 y NaN 100.5 2 3 NaN 20.0 200.0 Data Types of the each column is: ints int64 strs object ints2 float64 floats float64 Output DataFrame: ints strs ints2 floats 0 1 x 10 <NA> 1 2 y <NA> 100.5 2 3 <NA> 20 200.0 Data Types of the resultant DataFrame is: ints Int64 strs string ints2 Int64 floats Float64

The DataFrame.convert_dtypes() function

The pandas DataFrame.convert_dtypes() function is used to convert the data type of the columns to the best possible types using dtypes supporting pd.NA and it returns a new DataFrame object with updated dtypes.

Syntax

DataFrame.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True)

Parameters

The default value for all the parameters is True. These all are indicates whether object dtypes should be converted to the best possible types.

Example

In this example, we will convert the datatype of the DataFrame columns using the .convert_dtypes() method.

Open Compiler
import pandas as pd import numpy as np df = pd.DataFrame({"a":[1, 2, 3], "b": ["x", "y", "z"], "c": [True, False, np.nan], "d": ["h", "i", np.nan], "e": [10, np.nan, 20], "f": [np.nan, 100.5, 200]}) print("Input DataFrame:") print(df) print('Data Types of the each column is: ') print(df.dtypes) # Convert the data type of columns result = df.convert_dtypes() print("Output DataFrame:") print(result) print('Data Types of the resultant DataFrame is: ') print(result.dtypes)

Output

Input DataFrame:
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0

Data Types of the each column is: 
a      int64
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Output DataFrame:
   a  b      c     d     e      f
0  1  x   True     h    10   
1  2  y  False     i    100.5
2  3  z         20  200.0

Data Types of the resultant DataFrame is: 
a      Int64
b     string
c    boolean
d     string
e      Int64
f    Float64
dtype: object

Initially, we check the data types of the DataFrame columns using dtypes() method. And then the data type of column "b" is converted to the string, c is converted to Boolean, "d" is converted to the string, and "e" is converted to int64 using the convert_dtypes() method.

The DataFrame.astype() function

The pandas DataFrame.astype() function is used to convert the data type of the pandas object to a specified dtype. Following is the syntax -

DataFrame.astype(dtype, copy, errors)

Parameters

  • dtype: data type, or dict {col: dtype, ?}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame's columns to a specific data type.

  • copy: The default value is True, whether to do the changes in the original DataFrame (False) or create a copy (True).

  • errors: The default value is ?raise'. Whether to ignore errors or raise an exception on error.

Example

In this example, we will convert the data type of all columns to an object type using the astype() function.

Open Compiler
import pandas as pd df = pd.DataFrame({'Integers':[1, 2, 5, 3, 6, 0], 'Floats': [1.3, None, 4.6, 1.02, 0.3, 0.001], 'Strings': ['p', 'y', 't', 'h', 'o', 'n'], 'Dates': pd.date_range('2023-05-04', periods=6, freq='W')}) print("Input DataFrame:") print(df) print('Data Types of each column is: ') print(df.dtypes) # Convert the data type of columns result = df.astype('object') print("Output DataFrame:") print(result) print('Data Types of the resultant DataFrame is: ') print(result.dtypes)

Output

Input DataFrame:
   Integers  Floats Strings      Dates
0         1   1.300       p 2023-05-07
1         2     NaN       y 2023-05-14
2         5   4.600       t 2023-05-21
3         3   1.020       h 2023-05-28
4         6   0.300       o 2023-06-04
5         0   0.001       n 2023-06-11

Data Types of each column is: 
Integers             int64
Floats             float64
Strings             object
Dates       datetime64[ns]
dtype: object

Output DataFrame:
  Integers Floats Strings                Dates
0        1    1.3       p  2023-05-07 00:00:00
1        2    NaN       y  2023-05-14 00:00:00
2        5    4.6       t  2023-05-21 00:00:00
3        3   1.02       h  2023-05-28 00:00:00
4        6    0.3       o  2023-06-04 00:00:00
5        0  0.001       n  2023-06-11 00:00:00

Data Types of the resultant DataFrame is: 
Integers    object
Floats      object
Strings     object
Dates       object
dtype: object

The datatype of all the columns converted to the object type.

Example

Let's take another example to convert the dtype of a few columns by using a dictionary.

Open Compiler
import pandas as pd df = pd.DataFrame({'Integers':[1, 2, 5, 3, 6, 0], 'Floats': [1.3, None, 4.6, 1.02, 0.3, 0.001], 'Strings': ['p', 'y', 't', 'h', 'o', 'n'], 'Dates': pd.date_range('2023-05-04', periods=6, freq='W')}) print("Input DataFrame:") print(df) print('Data Types of each column is: ') print(df.dtypes) # Convert the data type of columns result = df.astype({'Floats':'object', 'Strings': 'category'}) print("Output DataFrame:") print(result) print('Data Types of the resultant DataFrame is: ') print(result.dtypes)

Output

Input DataFrame:
   Integers  Floats Strings      Dates
0         1   1.300       p 2023-05-07
1         2     NaN       y 2023-05-14
2         5   4.600       t 2023-05-21
3         3   1.020       h 2023-05-28
4         6   0.300       o 2023-06-04
5         0   0.001       n 2023-06-11

Data Types of each column is: 
Integers             int64
Floats             float64
Strings             object
Dates       datetime64[ns]
dtype: object

Output DataFrame:
   Integers Floats Strings      Dates
0         1    1.3       p 2023-05-07
1         2    NaN       y 2023-05-14
2         5    4.6       t 2023-05-21
3         3   1.02       h 2023-05-28
4         6    0.3       o 2023-06-04
5         0  0.001       n 2023-06-11

Data Types of the resultant DataFrame is: 
Integers             int64
Floats              object
Strings           category
Dates       datetime64[ns]
dtype: object

The columns Floats, Strings are converted to object and category dtypes.

Updated on: 2023-05-30T14:48:53+05:30

311 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements