0% found this document useful (0 votes)
11 views17 pages

Data Types in Pandas by Jaume Boguñá

Uploaded by

kalupranav2611
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

Data Types in Pandas by Jaume Boguñá

Uploaded by

kalupranav2611
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

DATA TYPE CONVERSIONS

PYTHON for DATA SCIENCE

Jaume Boguñá
Dive into Python
Data Types in Pandas
Numeric Types
int64 64-bit integer

int32 32-bit integer

float64 64-bit floating point


float32 32-bit floating point
Boolean Type
bool Boolean values (True, False)

Jaume Boguñá

Dive into Python 2


Data Types in Pandas
Object Type
Used for text or mixed data types
object (e.g., strings, lists, dictionaries).
It’s the most general type.
Datetime Types
64-bit date and time, with
datetime64[ns]
nanosecond precision.
Differences between datetime
timedelta[ns]
values, measured in nanoseconds.

Jaume Boguñá

Dive into Python 3


Data Types in Pandas
Categorical Type
Categorical data used for repetitive
category values to save memory and improve
performance
Other Types
complex128 Complex numbers (rarely used)
Explicit string dtype (an alternative
string
to object for text data)

Jaume Boguñá

Dive into Python 4


Data Type Conversions in Pandas

Data type conversions are essential when working with different types of
data such as integers, floats, strings, and datetime objects.

Let’s explore two important data type conversion methods in Pandas:

1. astype()

2. pd.to_datetime()

Jaume Boguñá

Dive into Python 5


1. astype()
Converts the data type of a Pandas column(s) to a specified type

Parameters

DataFrame.astype(

dtype,

copy=None,

errors='raise'

Jaume Boguñá

Dive into Python 6


1. astype()
Creating the DataFrame and Checking dtypes

import pandas as pd

data = {
"Duration": ['50', '40', '45'],
"Pulse": [109, 117, 110],
"Calories": [409.1, 479.5, 340.8]
}

df = pd.DataFrame(data)

df.dtypes
Duration object
Pulse int64
Calories float64
Jaume Boguñá

Dive into Python


7
1. astype()
Casting a pandas object to a specified dtype

df
Duration Pulse Calories
0 50 109 409.1
1 40 117 479.5
2 45 110 340.8

# Changing dtypes to int


newdf = df.astype('int64')
Duration Pulse Calories
0 50 109 409
1 40 117 479
2 45 110 340
Jaume Boguñá

Dive into Python


8
1. astype()
Casting a pandas object to a specified dtype

df
Duration Pulse Calories
0 50 109 409.1
1 40 117 479.5
2 45 110 340.8

# Changing Pulse and Calories dtypes to str


newdf = df.astype({'Pulse': 'str', 'Calories': 'str'})
newdf.dtypes
Duration object
Pulse object
Calories object

Jaume Boguñá

Dive into Python


9
2. pd.to_datetime()
Convert argument to datetime

Parameters
pd.to_datetime(arg,
errors='raise',
dayfirst=False,
yearfirst=False,
utc=False,
format=None,
exact=<no_default>,
unit=None,
infer_datetime_format=<no_default>,
origin='unix',
cache=True
)

Jaume Boguñá

Dive into Python 10


2. pd.to_datetime()
Assembling a datetime from multiple columns

import pandas as pd

df = pd.DataFrame({'year': [2023, 2024],


'month': [11, 2],
'day': [9, 17]})

# The keys can be common abbreviations like [‘year’, ‘month’,


‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’])
newdf = pd.to_datetime(df)
0 2023-11-09
1 2024-02-17
dtype: datetime64[ns]

Jaume Boguñá

Dive into Python


11
2. pd.to_datetime()
Using a unix epoch time

import pandas as pd

# Unix timestamp in seconds


timestamp_in_seconds = 1672531200

# Convert to datetime
pd.to_datetime(timestamp_in_seconds, unit='s')
2023-01-01 00:00:00

Jaume Boguñá

Dive into Python


12
2. pd.to_datetime()
Using dayfirst and yearfirst
import pandas as pd

# Ambiguous date format


date_strings = ['10/12/2023', '01/02/2023']

# Treat first number as day


day_first = pd.to_datetime(date_strings, dayfirst=True)
print("Day first:", day_first)
Day first: DatetimeIndex(['2023-12-10', '2023-02-01'])

# Treat first number as year


year_first = pd.to_datetime(date_strings, yearfirst=True)
print("Year first:", year_first)
Year first: DatetimeIndex(['2023-10-12', '2023-01-02'])

Jaume Boguñá

Dive into Python


13
2. pd.to_datetime()
Convert to UTC

import pandas as pd

# Unix timestamp
timestamp_in_seconds = [1609459200, 1609545600]

# Convert to UTC
utc_time = pd.to_datetime(timestamp_in_seconds,
unit='s', utc=True)
DatetimeIndex(['2021-01-01 00:00:00+00:00', '2021-
01-02 00:00:00+00:00'], dtype='datetime64[ns,
UTC]', freq=None)

Jaume Boguñá

Dive into Python


14
2. pd.to_datetime()
Using a Custom format

import pandas as pd

# Date string in a specific format


date_strings = ['2023-10-01', '2024-10-13']

# Define the format explicitly


pd.to_datetime(date_strings, format='%Y-%d-%m',
errors='coerce')
DatetimeIndex(['2023-01-10', 'NaT'],
dtype='datetime64[ns]', freq=None)
Jaume Boguñá

Dive into Python


15
2. pd.to_datetime()
Enabling Caching (cache) to speed up the conversion
import pandas as pd

# Repeated dates
repeated_dates = ['2024-10-03'] * 100000

# Convert with caching enabled


cached_dates = pd.to_datetime(repeated_dates, cache=True)
DatetimeIndex(['2024-10-03', '2024-10-03', '2024-10-03', '2024-10-03',
'2024-10-03', '2024-10-03', '2024-10-03', '2024-10-03',
'2024-10-03', '2024-10-03',
...
'2024-10-03', '2024-10-03', '2024-10-03', '2024-10-03',
'2024-10-03', '2024-10-03', '2024-10-03', '2024-10-03',
'2024-10-03', '2024-10-03'],
dtype='datetime64[ns]', length=100000, freq=None)

Jaume Boguñá

Dive into Python


16
Like Comment Share

Jaume Boguñá
Aerospace Engineer | Data Scientist

You might also like