Python Pandas
Python Pandas
DATAFRAME
PIVOT
Pivoting - Dataframe
There are two functions available in python for pivoting
dataframe.
1. pivot()
2. pivot_table()
table = OrderedDict((
("ITEM", ['TV', 'TV', 'AC', 'AC']),
('COMPANY',['LG', 'VIDEOCON', 'LG',
'SONY']),
('RUPEES', ['12000', '10000', '15000',
'14000']),
('USD', ['700', '650', '800', '750'])
))
d = DataFrame(table) print("DATA
OF DATAFRAME")
print(d)
p = d.pivot(index='ITEM',
columns='COMPANY',
values='RUPEES')
print("\n\nDATA OF PIVOT")
print(p)
print
(p[p.index=='TV'].LG.values)
Pivoting - Dataframe
#Common Problem in Pivoting
pivot method takes at least 2 column names as parameters - the index and the
columns name as parameters. Now the problem may arise- What happens if we have
multiple rows with the same values for these columns? What will be the value of the
corresponding cell in the pivoted table using pivot method? The following diagram
depicts the problem:
d.pivot_table(index='ITEM', columns='COMPANY',
values='RUPEES‘,aggfunc=np.mean)
In essence pivot_table is a generalisation of pivot, which allows you to
aggregate multiple values with the same destination in the pivoted table.
Sorting - Dataframe
Sorting means arranging the contents in ascending or
descending order.There are two kinds of sorting
available in pandas(Dataframe).
1. By value(column)
2. By index
#Create a Dictionary
of series
d=
OUTPUT
'Score':pd.Series([87,89,67,55,47])}
{'Name':pd.Series(['
Dataframe contents without sorting
Sachin','Dhoni','Virat Name Age Score
#Create a DataFrame
','Rohit','Shikhar']),
df'Age':pd.Series([26
= 1 Sachin 26
pd.DataFrame(d) 87
,27,25,24,31]), 2 Dhoni 27
print("Dataframe
89
contents without 3 Virat 25
sorting") 67
print (df) 4 Rohit contents
Dataframe 24 after sorting
df=df.sort_values(by='Score') 55
print("Dataframe contents afterobject
sorting") 5 ShikharName Age Score
31
# In above example dictionary is used to create 5 Shikhar 31 47
print (df) 47
the dataframe.Elements of dataframe object df is s 4 Rohit 55
orted by sort_value() method.As argument we are 24
passing value score for by parameter only.by default 3 Virat 67
it is sorting in ascending manner. 25
2 Dhoni 87
27
1 Sachin 89
26
Sorting - Dataframe
Sort the python pandas Dataframe by single column – Descending order
import pandas as pd
import numpy as np
#Create a Dictionary
of series
d=
'Score':pd.Series([87,89,67,55,47])}
{'Name':pd.Series([' OUTPUT
Sachin','Dhoni','Virat Dataframe contents without sorting
#Create a DataFrame Name Age Score
','Rohit','Shikhar']),
df = 1 Sachin 26 89
'Age':pd.Series([26
pd.DataFrame(d) 2 Dhoni 27 87
,27,25,24,31]), 3 Virat 25
print("Dataframe contents without sorting")
print (df) 67
4 Rohit 24 55
df=df.sort_values(by='Score',ascending=0)
5 Shikhar 31 47
print("Dataframe contents after sorting") Dataframe contents after sorting
print (df) Name Age Score
# In above example dictionary object is used to create 1 Dhoni 27 89
0 Sachin 26 87
the dataframe. Elements of dataframe object df are 2 Virat 25 67
sorted by sort_value() method. We are passing 0 for 3 Rohit 24 55
Ascending parameter ,which sort the data in 4 Shikhar 31 47
descending order of score.
Sorting - Dataframe
Sort the pandas Dataframe by Multiple Columns
import pandas as pd
import numpy as np
#Create a Dictionary
of series
d = {'Name':pd.Series(['Sachin','Dhoni','Virat','Rohit','Shikhar']),
'Score':pd.Series([87,67,89,55,47])} OUTPUT
'Age':pd.Series([26,25,25,24,31]), Dataframe contents without sorting
Name Age Score
#Create a DataFrame df =
1 Sachin 26 87
pd.DataFrame(d)
2 Dhoni 25 67
print("Dataframe contents without sorting") 3 Virat 25 89
print (df) 4 Rohit 24
df=df.sort_values(by=['Age', 55
'Score'],ascending=[True,False]) 5 Shikhar 31 47
print("Dataframe contents after sorting") Dataframe contents after sorting
print (df) Name Age Score
# In above example dictionary object is used to create the 4 Rohit 24 55
dataframe.Elements of dataframe object df are sorted by 3 Virat 25
sort_value() method. We are passing two columns as the 89
parameter value and in ascending parameter also with 2 Dhoni 25 67
two parameters first true and second false,which means 1 Sachin 26 87
5 Shikhar 31
sort in ascending order of age and descending order of
47
score
Sorting - Dataframe
2. By index - Sorting on the basis of dataframe index is
done using method sort_index(), in conjunction with
sort_values() method. We will now see the two aspects of
sorting on the basis of index of dataframe.
#Create a Dictionary of
series
d=
{'Name':pd.Series(['Sa
'Score':pd.Series([87,67,89,55,47])}
chin','Dhoni','Virat','Ro
#hit','Shikhar']),
Create a DataFrame
df 'Age':pd.Series([26,2
= pd.DataFrame(d)
5,25,24,31]),
df=df.reindex([1,4,3,2,0])
print("Dataframe contents
without sorting")
print (df)
df1=df.sort_index()
print("Dataframe contents after sorting")
print (df1)
# In above example dictionary object is used to create the
dataframe. Elements of dataframe object df is first
reindexed by reindex() method,index 1 is positioned at 0,4 at 1
and so on.then sorting by sort_index() method. By default it is
sorting in ascending order of index.
Sorting - Dataframe
Sorting pandas dataframe by index in descending order:
import pandas as pd import
numpy as np
Data aggregation –
Aggregation is the process of turning the values of a dataset (or a
subset of it) into one single value or data aggregation is a
multivalued function ,which require multiple values and return a
single value as a result.There are number of aggregations possible
like count,sum,min,max,median,quartile etc. These(count,sum etc.)
are descriptive statistics and other related operations on
DataFrame Let us make this clear! If we have a DataFrame like…