Python Pandas Tutorial PDF
Python Pandas Tutorial PDF
In this tutorial, we will learn the various features of Python Pandas and how to use them
in practice.
Audience
This tutorial has been prepared for those who seek to learn the basics and various functions
of Pandas. It will be specifically useful for people working with data cleansing and analysis.
After completing this tutorial, you will find yourself at a moderate level of expertise from
where you can take yourself to higher levels of expertise.
Prerequisites
You should have a basic understanding of Computer Programming terminologies. A basic
understanding of any of the programming languages is a plus.
Pandas library uses most of the functionalities of NumPy. It is suggested that you go
through our tutorial on NumPy before proceeding with this tutorial. You can access it from:
NumPy Tutorial.
All the content and graphics published in this e-book are the property of Tutorials Point (I)
Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish
any contents or a part of contents of this e-book in any manner without written consent
of the publisher.
We strive to update the contents of our website and tutorials as timely and as precisely as
possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.
Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our
website or its contents including this tutorial. If you discover any errors on our website or
in this tutorial, please notify us at [email protected].
i
Python Pandas
Table of Contents
About the Tutorial ............................................................................................................................................ i
Audience ........................................................................................................................................................... i
Prerequisites ..................................................................................................................................................... i
Series ............................................................................................................................................................... 4
DataFrame ....................................................................................................................................................... 4
Panel ................................................................................................................................................................ 5
4. Pandas — Series........................................................................................................................................ 6
pandas.Series................................................................................................................................................... 6
ii
Python Pandas
pandas.DataFrame ........................................................................................................................................ 14
Column .......................................................................................................................................................... 20
Addition ......................................................................................................................................................... 20
pandas.Panel() ............................................................................................................................................... 26
iii
Python Pandas
Renaming ....................................................................................................................................................... 61
Iterating a DataFrame.................................................................................................................................... 62
iteritems() ...................................................................................................................................................... 63
iterrows() ....................................................................................................................................................... 64
itertuples() ..................................................................................................................................................... 64
By Label ......................................................................................................................................................... 66
get_option(param) ........................................................................................................................................ 82
set_option(param,value) ............................................................................................................................... 83
reset_option(param) ..................................................................................................................................... 83
describe_option(param) ................................................................................................................................ 84
option_context() ............................................................................................................................................ 84
.loc() ............................................................................................................................................................... 86
.iloc() .............................................................................................................................................................. 90
.ix() ................................................................................................................................................................. 92
iv
Python Pandas
Percent_change ............................................................................................................................................. 96
Covariance ..................................................................................................................................................... 97
Correlation ..................................................................................................................................................... 98
Data Ranking.................................................................................................................................................. 98
v
Python Pandas
Histograms................................................................................................................................................... 153
vi
1. Pandas – Introduction Python Pandas
In 2008, developer Wes McKinney started developing pandas when in need of high
performance, flexible tool for analysis of data.
Prior to Pandas, Python was majorly used for data munging and preparation. It had very
less contribution towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of the origin
of data — load, prepare, manipulate, model, and analyze.
Python with Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.
1
2. Pandas – Environment SetupPython Pandas
Standard Python distribution doesn't come bundled with Pandas module. A lightweight
alternative is to install NumPy using popular Python package installer, pip.
If you install Anaconda Python package, Pandas will be installed by default with the
following:
Windows
Anaconda (from https://www.continuum.io) is a free Python distribution for SciPy
stack. It is also available for Linux and Mac.
Python (x,y) is a free Python distribution with SciPy stack and Spyder IDE for
Windows OS. (Downloadable from http://python-xy.github.io/)
Linux
Package managers of respective Linux distributions are used to install one or more
packages in SciPy stack.
2
3. Pandas – Introduction to Data Structures
Python Pandas
Series
DataFrame
Panel
These data structures are built on top of Numpy array, which means they are fast.
Building and handling two or more dimensional arrays is a tedious task, burden is placed
on the user to consider the orientation of the data set when writing functions. But using
Pandas data structures, the mental effort of the user is reduced.
For example, with tabular data (DataFrame) it is more semantically helpful to think of
the index (the rows) and the columns rather than axis 0 and axis 1.
Mutability
All Pandas data structures are value mutable (can be changed) and except Series all are
size mutable. Series is size immutable.
Note: DataFrame is widely used and one of the most important data structures. Panel is
very less used.
3
Python Pandas
Series
Series is a one-dimensional array like structure with homogeneous data. For example, the
following series is a collection of integers 10, 23, 56, …
10 23 56 17 52 61 73 90 26 72
Key Points
Homogeneous data
Size Immutable
DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For example,
The table represents the data of a sales team of an organization with their overall
performance rating. The data is represented in rows and columns. Each column represents
an attribute and each row represents a person.
Column Type
Name String
Age Integer
Gender String
Rating Float
Key Points
Heterogeneous data
4
Python Pandas
Size Mutable
Data Mutable
Panel
Panel is a three-dimensional data structure with heterogeneous data. It is hard to
represent the panel in graphical representation. But a panel can be illustrated as a
container of DataFrame.
Key Points
Heterogeneous data
Size Mutable
Data Mutable
5
Python Pandas