0% found this document useful (0 votes)

115 views33 pages

Data Analytics Pandas

Pandas is an open-source Python library that provides powerful data structures and data analysis tools for manipulating and analyzing multidimensional data. It was created by Wes McKinney in 2008 to fill the need for high performance, flexible tools for working with structured data in Python. Pandas allows users to load, prepare, manipulate, model and analyze data using its core data structures - Series for 1D data, DataFrame for 2D labeled data, and Panel for 3D labeled data. Pandas is widely used in domains like finance, economics, statistics and analytics.

Uploaded by

Vivek Munjayasra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views33 pages

Data Analytics Pandas

Uploaded by

Vivek Munjayasra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Python Pandas – Introduction

Pandas is an open-source Python Library providing high-performance data

manipulation and analysis tool using its powerful data structures. The name Pandas
is derived from the word Panel Data – an Econometrics from Multidimensional data.
In 2008, developer Wes McKinney started developing pandas when in need of high
performance, flexible tool for analysis of data.
Prior to Pandas, Python was majorly used for data munging and preparation. It had
very little contribution towards data analysis. Pandas solved this problem. Using
Pandas, we can accomplish five typical steps in the processing and analysis of
data, regardless of the origin of data — load, prepare, manipulate, model, and
analyze.
Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.

Key Features of Pandas

 Fast and efficient DataFrame object with default and customized indexing.
 Tools for loading data into in-memory data objects from different file formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of date sets.
 Label-based slicing, indexing and subsetting of large data sets.
 Columns from a data structure can be deleted or inserted.
 Group by data for aggregation and transformations.
 High performance merging and joining of data.
 Time Series functionality.

Standard Python distribution doesn't come bundled with Pandas module. A

lightweight alternative is to install NumPy using popular Python package
installer, pip.
pip install pandas
If you install Anaconda Python package, Pandas will be installed by default with the
following

Pandas deals with the following three data structures −

 Series
 DataFrame
 Panel
These data structures are built on top of Numpy array, which means they are fast.
Dimension & Description
The best way to think of these data structures is that the higher dimensional data
structure is a container of its lower dimensional data structure. For example,
DataFrame is a container of Series, Panel is a container of DataFrame.

Data Dimension Description

Structure s

Series 1 1D labeled homogeneous array, size immutable.

Data 2 General 2D labeled, size-mutable tabular structure with

Frames potentially heterogeneously typed columns.

Panel 3 General 3D labeled, size-mutable array.

Building and handling two or more dimensional arrays is a tedious task, burden is
placed on the user to consider the orientation of the data set when writing functions.
But using Pandas data structures, the mental effort of the user is reduced.
For example, with tabular data (DataFrame) it is more semantically helpful to think
of the index (the rows) and the columns rather than axis 0 and axis 1.
Mutability
All Pandas data structures are value mutable (can be changed) and except Series
all are size mutable. Series is size immutable.
Note − DataFrame is widely used and one of the most important data structures.
Panel is used much less.

Series
Series is a one-dimensional array like structure with homogeneous data. For
example, the following series is a collection of integers 10, 23, 56, …

10 23 56 17 52 61 73 90 26 72

Key Points

 Homogeneous data
 Size Immutable
 Values of Data Mutable
DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For example,

Name Age Gender Rating

Steve 32 Male 3.45

Lia 28 Female 4.6

Vin 45 Male 3.9

Katie 38 Female 2.78

The table represents the data of a sales team of an organization with their overall
performance rating. The data is represented in rows and columns. Each column
represents an attribute and each row represents a person.

Data Type of Columns

The data types of the four columns are as follows −

Column Type

Name String

Age Integer

Gender String

Rating Float

Key Points

 Heterogeneous data
 Size Mutable
 Data Mutable
Panel
Panel is a three-dimensional data structure with heterogeneous data. It is hard to
represent the panel in graphical representation. But a panel can be illustrated as a
container of DataFrame.
Key Points

 Heterogeneous data
 Size Mutable
 Data Mutable
Series is a one-dimensional labeled array capable of holding data of any type
(integer, string, float, python objects, etc.). The axis labels are collectively called
index.

pandas.Series
A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)
The parameters of the constructor are as follows −

Sr.No Parameter & Description

1
data
data takes various forms like ndarray, list, constants

2
index
Index values must be unique and hashable, same length as
data. Default np.arrange(n) if no index is passed.

3
dtype
dtype is for data type. If None, data type will be inferred

4
copy
Copy data. Default False

A series can be created using various inputs like −

 Array
 Dict
 Scalar value or constant
Create an Empty Series
A basic series, which can be created is an Empty Series.
Example
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print (s)

Its output is as follows −
Series([], dtype: float64)

Create a Series from ndarray

If data is an ndarray, then index passed must be of the same length. If no index is
passed, then by default index will be range(n) where n is array length, i.e.,
[0,1,2,3…. range(len(array))-1].
Example 1
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)

Its output is as follows −
0 a
1 b
2 c
3 d
dtype: object
We did not pass any index, so by default, it assigned the indexes ranging from 0
to len(data)-1, i.e., 0 to 3.
Example 2
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print (s)

Its output is as follows −
100 a
101 b
102 c
103 d
dtype: object
We passed the index values here. Now we can see the customized indexed values
in the output.

Create a Series from dict

A dict can be passed as input and if no index is specified, then the dictionary keys
are taken in a sorted order to construct index. If index is passed, the values in data
corresponding to the labels in the index will be pulled out.
Example 1
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print (s)

Its output is as follows −
a 0.0
b 1.0
c 2.0
dtype: float64
Observe − Dictionary keys are used to construct index.
Example 2
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print (s)

Its output is as follows −
b 1.0
c 2.0
d NaN
a 0.0
dtype: float64
Observe − Index order is persisted and the missing element is filled with NaN (Not
a Number).

Create a Series from Scalar

If data is a scalar value, an index must be provided. The value will be repeated to
match the length of index
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print (s)

Its output is as follows −
0 5
1 5
2 5
3 5
dtype: int64

Accessing Data from Series with Position

Data in the series can be accessed similar to that in an ndarray.
Example 1
Retrieve the first element. As we already know, the counting starts from zero for the
array, which means the first element is stored at zero th position and so on.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first element

print (s[0]) # same as s[‘a’]

Its output is as follows −
1

Example 2
Retrieve the first three elements in the Series. If a : is inserted in front of it, all items
from that index onwards will be extracted. If two parameters (with : between them)
is used, items between the two indexes (not including the stop index)
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first three element

print (s[:3])

Its output is as follows −
a 1
b 2
c 3
dtype: int64

Example 3
Retrieve the last three elements.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the last three element

print (s[-3:])

Its output is as follows −
c 3
d 4
e 5
dtype: int64

Retrieve Data Using Label (Index)

A Series is like a fixed-size dict in that you can get and set values by index label.
Example 1
Retrieve a single element using index label value.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve a single element

print (s['a'])

Its output is as follows −
1

Example 2
Retrieve multiple elements using a list of index label values.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements

print (s[['a','c','d']])

Its output is as follows −
a 1
c 3
d 4
dtype: int64

Example 3
If a label is not contained, an exception is raised.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements

print (s['f'])

Its output is as follows −
…
KeyError: 'f'

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular

fashion in rows and columns.
Features of DataFrame

 Potentially columns are of different types

 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows and columns
Structure
Let us assume that we are creating a data frame with student’s data.

You can think of it as an SQL table or a spreadsheet data representation.

pandas.DataFrame
A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
The parameters of the constructor are as follows −

Sr.No Parameter & Description

1
data
data takes various forms like ndarray, series, map, lists, dict, constants and also
another DataFrame.

2
index
For the row labels, the Index to be used for the resulting frame is Optional Default
np.arange(n) if no index is passed.

3
columns
For column labels, the optional default syntax is - np.arange(n). This is only true if
no index is passed.

4
dtype
Data type of each column.

5
copy
This command (or whatever it is) is used for copying of data, if the default is False.

Create DataFrame
A pandas DataFrame can be created using various inputs like −

 Lists
 dict
 Series
 Numpy ndarrays
 Another DataFrame
In the subsequent sections of this chapter, we will see how to create a DataFrame
using these inputs.
Create an Empty DataFrame
A basic DataFrame, which can be created is an Empty Dataframe.
Example
Live Demo

#import the pandas library and aliasing as pd

import pandas as pd
df = pd.DataFrame()
print (df)

Its output is as follows −
Empty DataFrame
Columns: []
Index: []

Create a DataFrame from Lists

The DataFrame can be created using a single list or a list of lists.
Example 1
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)

Its output is as follows −
0
0 1
1 2
2 3
3 4
4 5

Example 2
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)

Its output is as follows −
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
Example 3
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print (df)

Its output is as follows −
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
Note − Observe, the dtype parameter changes the type of Age column to floating
point.

Create a DataFrame from Dict of ndarrays / Lists

All the ndarrays must be of same length. If index is passed, then the length of the
index should equal to the length of the arrays.
If no index is passed, then by default, index will be range(n), where n is the array
length.
Example 1
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':
[28,34,29,42]}
df = pd.DataFrame(data)
print (df)

Its output is as follows −
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
Note − Observe the values 0,1,2,3. They are the default index assigned to each
using the function range(n).
Example 2
Let us now create an indexed DataFrame using arrays.
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':
[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print (df)

Its output is as follows −
Name Age
rank1 Tom 28
rank2 Jack 34
rank3 Steve 29
rank4 Ricky 42
Note − Observe, the index parameter assigns an index to each row.

Create a DataFrame from List of Dicts

List of Dictionaries can be passed as input data to create a DataFrame. The
dictionary keys are by default taken as column names.
Example 1
The following example shows how to create a DataFrame by passing a list of
dictionaries.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print (df)

Its output is as follows −
a b c
0 1 2 NaN
1 5 10 20.0
Note − Observe, NaN (Not a Number) is appended in missing areas.
Example 2
The following example shows how to create a DataFrame by passing a list of
dictionaries and the row indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print (df)

Its output is as follows −
a b c
first 1 2 NaN
second 5 10 20.0

Example 3
The following example shows how to create a DataFrame with a list of dictionaries,
row indices, and column indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys

df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a',
'b'])
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a',
'b1'])
print (df1)
print (df2)

Its output is as follows −
#df1 output
a b
first 1 2
second 5 10

#df2 output
a b1
first 1 NaN
second 5 NaN
Note − Observe, df2 DataFrame is created with a column index other than the
dictionary key; thus, appended the NaN’s in place. Whereas, df1 is created with
column indices same as dictionary keys, so NaN’s appended.

Create a DataFrame from Dict of Series

Dictionary of Series can be passed to form a DataFrame. The resultant index is the
union of all the series indexes passed.
Example
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df)

Its output is as follows −
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Note − Observe, for the series one, there is no label ‘d’ passed, but in the result, for
the d label, NaN is appended with NaN.
Let us now understand column selection, addition, and deletion through
examples.
Column Selection
We will understand this by selecting a column from the DataFrame.
Example
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df ['one'])

Its output is as follows −
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64

Column Addition
We will understand this by adding a new column to an existing data frame.
Example
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column

label by passing new series

print ("Adding a new column by passing as Series:")

df['three']=pd.Series([10,20,30],index=['a','b','c'])
print (df)

print ("Adding a new column using the existing columns in

DataFrame:")
df['four']=df['one']+df['three']

print (df)

Its output is as follows −
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Adding a new column using the existing columns in DataFrame:
one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN NaN

Column Deletion
Columns can be deleted or popped; let us take an example to understand how.
Example
# Using the previous DataFrame, we will delete a column
# using del function
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print (df)

# using del function

print ("Deleting the first column using DEL function:")
del df['one']
print (df)

# using pop function

print ("Deleting another column using POP function:")
df.pop('two')
print (df)

Its output is as follows −
Our dataframe is:
one three two
a 1.0 10.0 1
b 2.0 20.0 2
c 3.0 30.0 3
d NaN NaN 4

Deleting the first column using DEL function:

three two
a 10.0 1
b 20.0 2
c 30.0 3
d NaN 4

Deleting another column using POP function:

three
a 10.0
b 20.0
c 30.0
d NaN

Row Selection, Addition, and Deletion

We will now understand row selection, addition and deletion through examples. Let
us begin with the concept of selection.
Selection by Label
Rows can be selected by passing row label to a loc function.
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df.loc['b'])

Its output is as follows −
one 2.0
two 2.0
Name: b, dtype: float64
The result is a series with labels as column names of the DataFrame. And, the
Name of the series is the label with which it is retrieved.
Selection by integer location
Rows can be selected by passing integer location to an iloc function.
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df.iloc[2])

Its output is as follows −
one 3.0
two 3.0
Name: c, dtype: float64

Slice Rows
Multiple rows can be selected using ‘ : ’ operator.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df[2:4])

Its output is as follows −
one two
c 3.0 3
d NaN 4

Addition of Rows
Add new rows to a DataFrame using the append function. This function will append
the rows at the end.
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print (df)

Its output is as follows −
a b
0 1 2
1 3 4
0 5 6
1 7 8

Deletion of Rows
Use index label to delete or drop rows from a DataFrame. If label is duplicated, then
multiple rows will be dropped.
If you observe, in the above example, the labels are duplicate. Let us drop a label
and will see how many rows will get dropped.
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0

df = df.drop(0)

print (df)

Its output is as follows −
a b
1 3 4
1 7 8
In the above example, two rows were dropped because those two contain the same
label 0.

Some Examples

import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data series is:")
print (df)

Its output is as follows −
Our data series is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

T (Transpose)
Returns the transpose of the DataFrame. The rows and columns will interchange.
import pandas as pd
import numpy as np

# Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
print (df.T)

Its output is as follows −
The transpose of the data series is:
0 1 2 3 4 5 6
Age 25 26 25 23 30 29 23
Name Tom James Ricky Vin Steve Smith Jack
Rating 4.23 3.24 3.98 2.56 3.2 4.6 3.8

A panel is a 3D container of data. The term Panel data is derived from

econometrics and is partially responsible for the name pandas − pan(el)-da(ta)-s.
The names for the 3 axes are intended to give some semantic meaning to
describing operations involving panel data. They are −
 items − axis 0, each item corresponds to a DataFrame contained inside.
 major_axis − axis 1, it is the index (rows) of each of the DataFrames.
 minor_axis − axis 2, it is the columns of each of the DataFrames.

axes
Returns the list of row axis labels and column axis labels.
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Row axis labels and column axis labels are:")
print (df.axes)

Its output is as follows −
Row axis labels and column axis labels are:
[RangeIndex(start=0, stop=7, step=1), Index([u'Age', u'Name',
u'Rating'],
dtype='object')]

dtypes
Returns the data type of each column.
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("The data types of each column are:")
print (df.dtypes)

Its output is as follows −
The data types of each column are:
Age int64
Name object
Rating float64
dtype: object

empty
Returns the Boolean value saying whether the Object is empty or not; True
indicates that the object is empty.
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Is the object empty?")
print (df.empty)
Its output is as follows −
Is the object empty?
False

ndim
Returns the number of dimensions of the object. By definition, DataFrame is a 2D
object.
Live Demo

import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The dimension of the object is:")
print (df.ndim)

Its output is as follows −
Our object is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The dimension of the object is:

shape
Returns a tuple representing the dimensionality of the DataFrame. Tuple (a,b),
where a represents the number of rows and b represents the number of columns.
Live Demo
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The shape of the object is:")
print (df.shape)

Its output is as follows −
Our object is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The shape of the object is:

(7, 3)

size
Returns the number of elements in the DataFrame.
Live Demo

import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The total number of elements in our object is:")
print (df.size)

Its output is as follows −
Our object is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The total number of elements in our object is:

values
Returns the actual data in the DataFrame as an NDarray.
Live Demo

import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The actual data in our data frame is:")
print (df.values)

Its output is as follows −
Our object is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
The actual data in our data frame is:
[[25 'Tom' 4.23]
[26 'James' 3.24]
[25 'Ricky' 3.98]
[23 'Vin' 2.56]
[30 'Steve' 3.2]
[29 'Smith' 4.6]
[23 'Jack' 3.8]]

Head & Tail

To view a small sample of a DataFrame object, use the head() and tail()
methods. head() returns the first n rows (observe the index values). The default
number of elements to display is five, but you may pass a custom number.
Live Demo

import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The first two rows of the data frame is:")
print (df.head(2))

Its output is as follows −
Our data frame is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The first two rows of the data frame is:

Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
tail() returns the last n rows (observe the index values). The default number of
elements to display is five, but you may pass a custom number.

Live Demo
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The last two rows of the data frame is:")
print (df.tail(2))

Its output is as follows −
Our data frame is:
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The last two rows of the data frame is:

Age Name Rating
5 29 Smith 4.6
6 23 Jack 3.8

A large number of methods collectively compute descriptive statistics and other

related operations on DataFrame. Most of these are aggregations like sum(),
mean(), but some of them, like sumsum(), produce an object of the same size.
Generally speaking, these methods take an axis argument, just like ndarray.{sum,
std, ...}, but the axis can be specified by name or integer
 DataFrame − “index” (axis=0, default), “columns” (axis=1)
Let us create a DataFrame and use this object throughout this chapter for all the
operations.
Example
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),

'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.
80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print (df)

Its output is as follows −
Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
7 34 Lee 3.78
8 40 David 2.98
9 30 Gasper 4.80
10 51 Betina 4.10
11 46 Andres 3.65

sum()
Returns the sum of the values for the requested axis. By default, axis is index
(axis=0).
Live Demo

import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),

'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.
80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print (df.sum())

Its output is as follows −
Age 382
Name TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...
Rating 44.92
dtype: object
Each individual column is added individually (Strings are appended).

axis=1
This syntax will give the output as shown below.
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),

'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.
80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print (df.sum(1))

Its output is as follows −
0 29.23
1 29.24
2 28.98
3 25.56
4 33.20
5 33.60
6 26.80
7 37.78
8 42.98
9 34.80
10 55.10
11 49.65
dtype: float64
mean()
Returns the average value
Live Demo

import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),

'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.
80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print (df.mean())

Its output is as follows −
Age 31.833333
Rating 3.743333
dtype: float64

std()
Returns the Bressel standard deviation of the numerical columns.
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),

'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.
80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print (df.std())
Its output is as follows −
Age 9.232682
Rating 0.661628
dtype: float64

Functions & Description

Let us now understand the functions under Descriptive Statistics in Python Pandas.
The following table list down the important functions −

Sr.No. Function Description

1 count() Number of non-null observations

2 sum() Sum of values

3 mean() Mean of Values

4 median() Median of Values

5 mode() Mode of values

6 std() Standard Deviation of the Values

7 min() Minimum Value

8 max() Maximum Value

9 abs() Absolute Value

10 prod() Product of Values

11 cumsum() Cumulative Sum

12 cumprod() Cumulative Product

Note − Since DataFrame is a Heterogeneous data structure. Generic operations
don’t work with all functions.
 Functions like sum(), cumsum() work with both numeric and character (or)
string data elements without any error. Though n practice, character
aggregations are never used generally, these functions do not throw any
exception.
 Functions like abs(), cumprod() throw exception when the DataFrame
contains character or string data because such operations cannot be
performed.

Summarizing Data
The describe() function computes a summary of statistics pertaining to the
DataFrame columns.
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),

'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.
80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print (df.describe())

Its output is as follows −
Age Rating
count 12.000000 12.000000
mean 31.833333 3.743333
std 9.232682 0.661628
min 23.000000 2.560000
25% 25.000000 3.230000
50% 29.500000 3.790000
75% 35.500000 4.132500
max 51.000000 4.800000
This function gives the mean, std and IQR values. And, function excludes the
character columns and given summary about numeric columns. 'include' is the
argument which is used to pass necessary information regarding what columns
need to be considered for summarizing. Takes the list of values; by default,
'number'.
 object − Summarizes String columns
 number − Summarizes Numeric columns
 all − Summarizes all columns together (Should not pass it as a list value)
Now, use the following statement in the program and check the output −
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),

'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.
80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print (df.describe(include=['object']))

Its output is as follows −
Name
count 12
unique 12
top Ricky
freq 1
Now, use the following statement and check the output −
import pandas as pd
import numpy as np

#Create a Dictionary of series

d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','J
ack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),

'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.
80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print (df. describe(include='all'))

Its output is as follows −
Age Name Rating
count 12.000000 12 12.000000
unique NaN 12 NaN
top NaN Ricky NaN
freq NaN 1 NaN
mean 31.833333 NaN 3.743333
std 9.232682 NaN 0.661628
min 23.000000 NaN 2.560000
25% 25.000000 NaN 3.230000
50% 29.500000 NaN 3.790000
75% 35.500000 NaN 4.132500
max 51.000000 NaN 4.800000

Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
138 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
XII IP CH 1 Python Pandas - I Series
No ratings yet
XII IP CH 1 Python Pandas - I Series
45 pages
Python Pandas
No ratings yet
Python Pandas
230 pages
Python Unit - 6 Pandas
No ratings yet
Python Unit - 6 Pandas
106 pages
Class 12 IP Ch-1, 2 3
No ratings yet
Class 12 IP Ch-1, 2 3
28 pages
12 IP Questions
No ratings yet
12 IP Questions
181 pages
Httpsncert Nic Intextbookpdfleip102 PDF
No ratings yet
Httpsncert Nic Intextbookpdfleip102 PDF
36 pages
Ip Chapter 1
No ratings yet
Ip Chapter 1
36 pages
Leip 102
No ratings yet
Leip 102
36 pages
Pandas
No ratings yet
Pandas
163 pages
Pandas
No ratings yet
Pandas
57 pages
Unit III Part 2 1725700061785
No ratings yet
Unit III Part 2 1725700061785
85 pages
Panda
No ratings yet
Panda
46 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
SR Ip Pandas I Full Notes
No ratings yet
SR Ip Pandas I Full Notes
30 pages
Data Handling Using Pandas-1 - Series Object Notes PDF
No ratings yet
Data Handling Using Pandas-1 - Series Object Notes PDF
25 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
9 pages
Chapter 1 and 2 Series and Data Frame
No ratings yet
Chapter 1 and 2 Series and Data Frame
45 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
23 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Pandas Notes
No ratings yet
Pandas Notes
19 pages
Python Pandas - Series Notes
No ratings yet
Python Pandas - Series Notes
13 pages
Introduction To Pandas & Data Structures
No ratings yet
Introduction To Pandas & Data Structures
11 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
Pandas Basics
No ratings yet
Pandas Basics
21 pages
Unit-1 Python Pandas
No ratings yet
Unit-1 Python Pandas
56 pages
1 IP 12 NOTES PythonPandas 2022 PDF
100% (3)
1 IP 12 NOTES PythonPandas 2022 PDF
66 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Python Pandas (II)
No ratings yet
Python Pandas (II)
18 pages
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
No ratings yet
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
25 pages
Pandas
No ratings yet
Pandas
20 pages
Ip 102
No ratings yet
Ip 102
36 pages
Pandas Python
No ratings yet
Pandas Python
11 pages
Python Pandas
100% (1)
Python Pandas
35 pages
Python Pandas
No ratings yet
Python Pandas
96 pages
Exp8 SBLC
No ratings yet
Exp8 SBLC
9 pages
Python Pandas
No ratings yet
Python Pandas
22 pages
Unit 4
No ratings yet
Unit 4
36 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
14 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
CH 2
No ratings yet
CH 2
36 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
LAST MINUTES REVISION Pandas Series
No ratings yet
LAST MINUTES REVISION Pandas Series
6 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
B23 - Node Backup Administration Using ENM
No ratings yet
B23 - Node Backup Administration Using ENM
36 pages
Class12 Pandas Notes
No ratings yet
Class12 Pandas Notes
23 pages
Data Handling Using Pandas - 1-2-1
No ratings yet
Data Handling Using Pandas - 1-2-1
10 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Pandas Notoes For XII PDF
No ratings yet
Pandas Notoes For XII PDF
12 pages
Product Analyst Assignment
No ratings yet
Product Analyst Assignment
10 pages
Rani Tester
No ratings yet
Rani Tester
5 pages
Content Analysis Admin Guide v22
No ratings yet
Content Analysis Admin Guide v22
157 pages
Machine Learning Laboratory Record Book: 1 Find S Algorithm
No ratings yet
Machine Learning Laboratory Record Book: 1 Find S Algorithm
22 pages
Lecture 1-Introduction: Data Structure and Algorithm Analysis
No ratings yet
Lecture 1-Introduction: Data Structure and Algorithm Analysis
27 pages
HP Openview
No ratings yet
HP Openview
36 pages
C Sharp: Presentation By:-Maheshwar Pandey XRG Consulting Pvt. Ltd. Mob - No - 7015337463
No ratings yet
C Sharp: Presentation By:-Maheshwar Pandey XRG Consulting Pvt. Ltd. Mob - No - 7015337463
12 pages
Architecture Document
No ratings yet
Architecture Document
3 pages
MSX-DOS v2.0 (Versión 2 Larga)
No ratings yet
MSX-DOS v2.0 (Versión 2 Larga)
102 pages
Resume Informatica
No ratings yet
Resume Informatica
4 pages
Dcit 26-Final Project
No ratings yet
Dcit 26-Final Project
13 pages
A Study On SQL RDBMS Concepts and Databa
No ratings yet
A Study On SQL RDBMS Concepts and Databa
5 pages
BMC Remedy Action Request System 6.3 Security Target: March 28, 2007 Part Number: 60658
No ratings yet
BMC Remedy Action Request System 6.3 Security Target: March 28, 2007 Part Number: 60658
67 pages
Digital and Leadership Acumen
No ratings yet
Digital and Leadership Acumen
30 pages
Recruitment Notification For The Post of Junior Technical Officer (IT Software)
No ratings yet
Recruitment Notification For The Post of Junior Technical Officer (IT Software)
5 pages
Using OpenLDAP With Bind9 DNS
No ratings yet
Using OpenLDAP With Bind9 DNS
20 pages
Test Bank Financial Accounting and Accounting Standard Chapter 17
No ratings yet
Test Bank Financial Accounting and Accounting Standard Chapter 17
19 pages
Devexpress: #Devexpres S
No ratings yet
Devexpress: #Devexpres S
12 pages
Dbms Ass2
No ratings yet
Dbms Ass2
3 pages
Quiz 3 - 1
No ratings yet
Quiz 3 - 1
6 pages
Building Web Application Using API
No ratings yet
Building Web Application Using API
15 pages
Product Backlog 2.1 Product Backlog For User Stories
No ratings yet
Product Backlog 2.1 Product Backlog For User Stories
32 pages
E - TC and Elex - Syllabus - 4102017 PDF
No ratings yet
E - TC and Elex - Syllabus - 4102017 PDF
3 pages
Red PPT Template-46-50
No ratings yet
Red PPT Template-46-50
5 pages
Fortigate Daily Security Report: Report Date: 2019-01-04 Data Range: Jan 03, 2019 (Pia-Fg900D)
No ratings yet
Fortigate Daily Security Report: Report Date: 2019-01-04 Data Range: Jan 03, 2019 (Pia-Fg900D)
13 pages
Multi-Tiered Architecture For Intrusion Prevention
No ratings yet
Multi-Tiered Architecture For Intrusion Prevention
4 pages
Case Study Scene 4: Case Study: Danforth Manufacturing Company Scene 4: Developing Current and Future EA Views
No ratings yet
Case Study Scene 4: Case Study: Danforth Manufacturing Company Scene 4: Developing Current and Future EA Views
2 pages
Jennifer Rosa: Weather Dashboard
No ratings yet
Jennifer Rosa: Weather Dashboard
1 page
Gigya Registration Process - Illustration
No ratings yet
Gigya Registration Process - Illustration
1 page