Change The Data Type of Columns in Pandas - LinkedIn
Change The Data Type of Columns in Pandas - LinkedIn
Search
Home My Network Jobs Messaging Notifications Me Work Free
Mohit Sharma
DevOps Engineer @SpaceServicesAustralia | (CKA) Kubernetes Certified | 3 9 articles Follow
X AWS Certified | Open Source Contributor
2. astype() - convert (almost) any type to (almost) any other type (even if it's not
necessarily sensible to do so). Also allows you to convert to categorial types (very
useful).
Read on for more detailed explanations and usage of each of these methods.
1. to_numeric()
The best way to convert one or more columns of a DataFrame to numeric values is to
use pandas.to_numeric().
This function will try to change non-numeric objects (such as strings) into integers or
floating-point numbers as appropriate.
Basic usage
As you can see, a new Series is returned. Remember to assign this output to a variable or
column name to continue using it:
# convert Series
my_series = pd.to_numeric(my_series)
You can also use it to convert multiple columns of a DataFrame via the apply() method:
As long as your values can all be converted, that's probably all you need.
Error handling
to_numeric() also takes an errors keyword argument that allows you to force non-numeric
values to be NaN, or simply ignore columns containing these values.
Here's an example using a Series of strings s which has the object dtype:
The default behaviour is to raise if it can't convert a value. In this case, it can't cope with the
string 'pandas':
Rather than fail, we might want 'pandas' to be considered a missing/bad numeric value. We
can coerce invalid values to NaN as follows using the errors keyword argument:
The third option for errors is just to ignore the operation if an invalid value is encountered:
This last option is particularly useful when you want to convert your entire DataFrame, but
don't know which of our columns can be converted reliably to a numeric type. In that case,
just write:
df.apply(pd.to_numeric, errors='ignore')
The function will be applied to each column of the DataFrame. Columns that can be
converted to a numeric type will be converted, while columns that cannot (e.g. they contain
non-digit strings or dates) will be left alone.
Downcasting
By default, conversion with to_numeric() will give you either an int64 or float64 dtype (or
whatever integer width is native to your platform).
That's usually what you want, but what if you wanted to save some memory and use a more
compact dtype, like float32, or int8?
to_numeric() gives you the option to downcast to either 'integer', 'signed', 'unsigned', 'float'.
Here's an example for a simple series s of integer type:
Downcasting to 'integer' uses the smallest possible integer that can hold the values:
2. astype()
The astype() method enables you to be explicit about the dtype you want your DataFrame or
Series to have. It's very versatile in that you can try and go from one type to any other.
Basic usage
Just pick a type: you can use a NumPy dtype (e.g. np.int16), some Python types (e.g. bool),
or pandas-specific types (like the categorical dtype).
Call the method on the object you want to convert and astype() will try and convert it for
you:
Notice I said "try" - if astype() does not know how to convert a value in the Series or
DataFrame, it will raise an error. For example, if you have a NaN or inf value you'll get an
error trying to convert it to an integer.
As of pandas 0.20.0, this error can be suppressed by passing errors='ignore'. Your original
object will be returned untouched.
Be careful
astype() is powerful, but it will sometimes convert values "incorrectly". For example:
These are small integers, so how about converting to an unsigned 8-bit type to save
memory?
>>> s.astype(np.uint8)
0 1
1 2
2 249
dtype: uint8
The conversion worked, but the -7 was wrapped round to become 249 (i.e. 28 - 7)!
3. infer_objects()
Version 0.21.0 of pandas introduced the method infer_objects() for converting columns of a
DataFrame that have an object datatype to a more specific type (soft conversions).
For example, here's a DataFrame with two columns of an object type. One holds actual
integers and the other holds strings representing integers:
Using infer_objects(), you can change the type of column 'a' to int64:
>>> df = df.infer_objects()
>>> df.dtypes
a int64
b object
dtype: object
Column 'b' has been left alone since its values were strings, not integers. If you wanted to try
and force the conversion of both columns to an integer type, you could
use df.astype(int) instead.
Do follow or connect with me for articles on AWS and Machine Learning Topics. If you are
interested in wanting particular topic comment below to let me know. Thank you.
Report this
Published by
Mohit Sharma 9 articles Follow
DevOps Engineer @SpaceServicesAustralia | (CKA) Kubernetes Certified |
3 X AWS Certified | Open Source Contributor
Published • 1y
1. to_numeric() - provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric
type.
2. astype() - convert (almost) any type to (almost) any other type (even if it's not necessarily sensible to do so).
Also allows you to convert to categorial types (very useful).
3. infer_objects() - a utility method to convert object columns holding Python objects to a pandas type if possible.
Reactions
+42
5 Comments
Most relevant
Add a comment…
anytime :)
Like Reply
This was very helpful to what I was working on. Thank you
Like · 1 Reply
Mohit Sharma
DevOps Engineer @SpaceServicesAustralia | (CKA) Kubernetes Certified | 3 X AWS Certified | Open Source Contributor
Follow
Set up a Continuous Deployment Text Parsing in Python with US- Time Zone Lifecycle of Data Science Projects
Pipeline in less than 15 min Patent Data
Mohit Sharma on LinkedIn Mohit Sharma on LinkedIn
Mohit Sharma on LinkedIn Mohit Sharma on LinkedIn
Privacy & Terms Ad Choices Advertising Manage your account and privacy
Go to your Settings.
Sales Solutions Mobile Small Business
Safety Center