0% found this document useful (0 votes)

37 views64 pages

Python Pandas-2

Uploaded by

vanisays.havesomepaani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views64 pages

Python Pandas-2

Uploaded by

vanisays.havesomepaani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 64

Python

Pandas II
PRESENTED BY

S U S M I TA C H O L K A R

Vikhe Patil Memorial School Pune

Learning Objective
Rename ,join concat, merge functions
Iterating over a Data Frame
Binary Operation on a Data Frame
Descriptive Statistics with Pandas
Essential Functions
Advanced Operations
Handling Missing Data
Combining Data Frames
Function groupby()

Vikhe Patil Memorial School Pune

Rename columns using rename()
function
import pandas as pd
You can rename a column, dt=({'English':[74,79,48,53,68],
row, and both using rename() 'Physics':[76,78,80,76,73],
function. 'Chemistry':[57,74,55,89,70],
'Biology':[76,85,63,68,59],
You can use both indexes or 'IP':[82,93,69,98,79]})
columns or anyone as a df=pd.DataFrame(dt, index=[1201,1202,1203,1204,1205])
parameter in rename function. print("Dataframe before rename:")
print(df)
Observe the following code print("Dataframe after rename:")
illustrating rename row and
columns together: df=df.rename(columns={'English':'Eng','Physics':'Phy','Chemistry':'Chem','Biolog
y':'Bio'},
index={1201:'Akshit',1202:'Bhavin',1203:'Chetan'})
print(df)
Joining
We can use the pandas.DataFrame.append() method to merge two DataFrames. It
appends rowsof the second DataFrame at the end of the first DataFrame. Columns
not present in the first DataFrame are added as new columns.
For example, consider the two DataFrames—
dFrame1 and dFrame2described below.
Let us use the append() method to append dFrame2 to dFrame1

>>> dFrame1=pd.DataFrame([[1, 2, 3], [4, 5], >>> dFrame2=pd.DataFrame([[10, 20], [30], [40,
[6]], columns=['C1', 'C2', 'C3'], index=['R1', 'R2', 'R3']) 50]], columns=['C2', 'C5'], index=['R4', 'R2','R5'])
dFrame2

dFrame1 =dFrame1.
append(dFrame2)
dFrame1
print(dFrame1)
dFrame2=dFrame2.append(dFrame1)
dFrame2

dFrame1 dFrame2
Alternatively, if we append dFrame1 to dFrame2, the rows of
dFrame2 precede the rows of dFrame1. To get the column labels
appear in sorted order we can set the parameter sort=True. The
column labels shall appear in unsorted order when the parameter
sort = False.

# append dFrame1 to dFrame2 >>> dFrame2

=dFrame2.append(dFrame1, sort=’True’) >>> dFrame2
C1 C2 C3 C5
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
Rename columns using column
properties
import pandas as pd
You can rename a column, dt=({'English':[74,79,48,53,68],
row, and both using rename() 'Physics':[76,78,80,76,73],
function. 'Chemistry':[57,74,55,89,70],
'Biology':[76,85,63,68,59],
You can use both indexes or 'IP':[82,93,69,98,79]})
columns or anyone as a df=pd.DataFrame(dt, index=[1201,1202,1203,1204,1205])
parameter in rename function. print("Dataframe before rename:")
print(df)
Observe the following code print("Dataframe after rename:")
illustrating rename row and df.columns=['Eng','Phy','Chem','Bio','Info.Prac']
columns together: print(df)
Rename columns using index
properties
import pandas as pd
You can rename a column, dt=({'English':[74,79,48,53,68],
row, and both using rename() 'Physics':[76,78,80,76,73],
function. 'Chemistry':[57,74,55,89,70],
'Biology':[76,85,63,68,59],
You can use both indexes or 'IP':[82,93,69,98,79]})
columns or anyone as a df=pd.DataFrame(dt, index=[1201,1202,1203,1204,1205])
parameter in rename function. print("Dataframe before rename:")
print(df)
Observe the following code print("Dataframe after rename:")
illustrating rename row and df.index=['Aman','Bhavik','Chandu','Dhaval','Eshan']
columns together: print(df)
head() function

The head() import pandas as pd

function is used dt=({'English':[74,79,48,53,68,44,65,67],
'Physics':[76,78,80,76,73,55,49,60],
to retrieve top
'Chemistry':[57,74,55,89,70,50,60,80],
rows from
'Biology':[76,85,63,68,59,79,49,69],
dataframe. 'IP':[82,93,69,98,79,88,77,66]})
Have a look on df=pd.DataFrame(dt,index=[1201,1202,1203,1204,1205,1206,1
the following 207,1208])
print("All data from Dataframe:")
code:
print(df)
print(df.head(n=4))
print(df.tail())

tail() function

The tail() function import pandas as pd

is used to retrieve dt=({'English':[74,79,48,53,68,44,65,67],
'Physics':[76,78,80,76,73,55,49,60],
bottom rows from
'Chemistry':[57,74,55,89,70,50,60,80],
dataframe.
'Biology':[76,85,63,68,59,79,49,69],
Have a look on 'IP':[82,93,69,98,79,88,77,66]})
the following df=pd.DataFrame(dt,index=[1201,1202,1203,1204,1205,1206,1
code: 207,1208])
print("All data from Dataframe:")
print(df)
print(df.tail())
concat() function
The concat() function import pandas as pd
is used to join more dt_sc=({'English':[74,79,48,53,68,44,65,67],
than one datframe 'Physics':[76,78,80,76,73,55,49,60],
into one unit. You can 'Chemistry':[57,74,55,89,70,50,60,80],})
combine dataframes xii_1=pd.DataFrame(dt_sc)
having similar dt_co=({'English':[66,65,87,56,86,44,56,76],
structures. Observe
'Physics':[67,87,80,67,77,55,45,80],
this code:
'Chemistry':[75,47,55,98,70,50,60,80],})
xii_2=pd.DataFrame(dt_co)
xii=pd.concat([xii_1,xii_2])
print(xii)
concat() function
xii=pd.concat([xii_1,xii_2],ignore_index=True)
English Physics Chemistry concat() function with ignore_index=True
0 74 76 57
1 79 78 74 English Physics Chemistry
2 48 80 55 0 74 76 57
3 53 76 89 1 79 78 74
4 68 73 70 2 48 80 55
5 44 55 50 3 53 76 89
6 65 49 60 4 68 73 70
7 67 60 80 5 44 55 50
0 66 67 75 6 65 49 60
1 65 87 47 7 67 60 80
2 87 80 55 8 66 67 75
3 56 67 98 9 65 87 47
4 86 77 70 10 87 80 55
5 44 55 50 11 56 67 98
6 56 45 60 12 86 77 70
7 76 80 80 13 44 55 50
14 56 45 60
15 76 80 80
concat() function
The concat() function is import pandas as pd
used to join more than dt_sc=({'English':[74,79,48,53,68,44,65,67],
one datframe into one 'Physics':[76,78,80,76,73,55,49,60],
unit. You can combine
dataframes having
'Chemistry':[57,74,55,89,70,50,60,80],})
similar structures. The xii_1=pd.DataFrame(dt_sc)
concat() function joins dt_co=({'English':[66,65,87,56,86,44,56,76],
the dataframe along 'Physics':[67,87,80,67,77,55,45,80],
with rows. If you want 'Chemistry':[75,47,55,98,70,50,60,80],})
to join dataframes using
column you can add
xii_2=pd.DataFrame(dt_co)
axis=1 parameter in the xii=pd.concat([xii_1,xii_2],axis=1)
concat() function. print(xii)
Pandas DataFrame.merge()
Pandas merge() is defined as the process of bringing the two datasets together into
one and aligning the rows based on the common attributes or columns.
It is an entry point for all standard database join operations between DataFrame
objects:

Syntax:
pd.merge(left, right, how='inner', on=None, left_on=None,
right_on=None,left_index=False, right_index=False, sort=True)
merge() Function
It is used to merge two dataframes that p1=({'P_ID':[1,2,3,4,5],
have some common values. You can
'First_Name':['Sachin','Saurav','Virendra','Mahendra Sinh','Gautam'],
specify the fields as on parameter in the
merge() function. It follows the concept 'Last_Name':['Tendulker','Ganguly','Sehvag','Dhoni','Gambhir']})
of RDBMS having parent column and
d1=pd.DataFrame(p1)
child columns in the dataframe. One
column should have common data. p2=({'P_ID':[1,2,3,4,5],'Runs':[18987,12120,11345,10345,12789]})
Have a look at this code:
d2=pd.DataFrame(p2)

players=pd.merge(d1,d2)

print(players)
To directly import a .csv file into DataFrame
A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

In our examples we will be using a CSV file called 'data.csv'.

If you have a large DataFrame with many rows, Pandas will only return the first 5 rows, and the
last 5 rows:

Tip: use to_string() to print the entire DataFrame.

Importing data from csv to
dataframe
Writing to a CSV File
Boolean
Indexing
BOOLEAN INDEXING IS A
TYPE OF INDEXING
WHICH USES ACTUAL
VA L U E S O F T H E D ATA I N
T H E D ATA F R A M E U S I N G
BOOLEAN VECTOR

THUS WE CAN USE

CONDITIONS ON THE
BASIS OF COLUMNS
N A M E S T O F I LT E R D ATA
VA L U E S .
Boolean Indexing
Getting the count of non NaN values
Getting the number of rows
Iterating over a Data Frame

There are three main ways to iterate over DataFrame:

iterrows()

items()

itertuples()

(however we will not be studying the last one)

Vikhe Patil Memorial School Pune

import pandas as pd
ResultSheet={'Arnab': pd.Series([90, 91, 97],index=['Maths','Science','Hindi']),
'Ramit': pd.Series([92, 81, 96],index=['Maths','Science','Hindi']),
'Samridhi': pd.Series([89, 91, 88],index=['Maths','Science','Hindi']),
'Riya': pd.Series([81, 71, 67],index=['Maths','Science','Hindi']),
'Mallika': pd.Series([94, 95, 99],index=['Maths','Science','Hindi'])}
r=pd.DataFrame(ResultSheet)
print(r) Series
for (row,rs) in r.iterrows():
print('Row Index',row)
print(rs) Row Index
Iterating DataFrames with iterrows()
Arnab Ramit Samridhi Riya Mallika Row Index Maths
Maths 90 92 89 81 94 Arnab 90
import pandas as pd Ramit 92
Science 91 81 91 71 95
ResultSheet={'Arnab': pd.Series([90, 91, 97],index=['Maths','Science','Hindi']),
Hindi 97 96 88 67 99 Samridhi 89
Riya 81
'Ramit': pd.Series([92, 81, 96],index=['Maths','Science','Hindi']), Mallika 94
Name: Maths, dtype: int64
'Samridhi': pd.Series([89, 91, 88],index=['Maths','Science','Hindi']), Row Index Science
'Riya': pd.Series([81, 71, 67],index=['Maths','Science','Hindi']), Arnab 91
Ramit 81
'Mallika': pd.Series([94, 95, 99],index=['Maths','Science','Hindi'])} Samridhi 91
Riya 71
r=pd.DataFrame(ResultSheet) Mallika 95
print(r) Name: Science, dtype: int64
The iterrows method iterates Row Index Hindi
for (row,rs) in r.iterrows(): over the rows in the form of Arnab 97
rowindex Series where the Ramit 96
print('Row Index',row) Samridhi 88
print(rs) series consists of the column Riya 67
values of the row Mallika 99
Name: Hindi, dtype: int64

Vikhe Patil Memorial School Pune

Iterating DataFrames with items()
col_name: first_name
import pandas as pd first_name last_name age data: id001 John
id001 John Smith 34 id002 Jane
df = pd.DataFrame({ id003 Marry
id002 Jane Doe 29
id004 Victoria
'first_name': ['John', 'Jane', 'Marry', 'Victoria', 'Gabriel', 'Layla'], id003 Marry Jackson 37 id005 Gabriel
'last_name': ['Smith', 'Doe', 'Jackson', 'Smith', 'Brown', 'Martinez'], id004 Victoria Smith 52 id006 Layla
id005 Gabriel Brown 26 Name: first_name, dtype: object
col_name: last_name
'age': [34, 29, 37, 52, 26, 32]}, id006 Layla Martinez 32 data: id001 Smith
index=['id001', 'id002', 'id003', 'id004', 'id005', 'id006']) id002 Doe
id003 Jackson
print(df) id004 Smith
id005 Brown
for col_name, data in df.items(): pairs of col_name and data. id006 Martinez
Name: last_name, dtype: object
print("col_name:",col_name, "\ndata:",data) These pairs will contain a col_name: age
data: id001 34
column name and every id002 29
row of data for that column. id003 37
id004 52
id005 26
id006 32
Name: age, dtype: int64

Vikhe Patil Memorial School Pune

Binary Operations In DataFrames
import pandas as pd
Result of adding two pandas dataframes:
dataSet1 = [(10, 20, 30), (40, 50, 60), (70, 80, 90)]; DataFrame1: 0 1 2
0 1 2 0 15.0 35.0 55.0
dataFrame1 = pd.DataFrame(data=dataSet1)
0 10 20 30 1 75.0 95.0 115.0
dataSet2 = [(5, 15, 25), (35, 45, 55),] ];
1 40 50 60 2 NaN NaN NaN
dataFrame2 = pd.DataFrame(data=dataSet2) 2 70 80 90
print("DataFrame1:");
print(dataFrame1);
print("DataFrame2:"); DataFrame2:
print(dataFrame2); 0 1 2
result = dataFrame1.add(dataFrame2); 0 5 15 25
0 1 2
result1=dataFrame1+dataFrame2 1 35 45 55
0 15.0 35.0 55.0
print("Result of adding two pandas dataframes:");
1 75.0 95.0 115.0
print(result) 2 NaN NaN NaN
print(result1)

Vikhe Patil Memorial School Pune

Binary Operations In DataFrames
import pandas as pd
dataSet1 = [(10, 20, 30),
Result of substracting two pandas
(50, 50, 60),
DataFrame1: dataframes:
(70, 80, 90)]; 0 1 2 DataFrame1:
dataFrame1 = pd.DataFrame(data=dataSet1) 0 10 20 30 0 1 2
dataSet2 = [(5, 15, 25), 1 50 50 60 0 10 20 30
(35, 45, 55)] 2 70 80 90 1 50 50 60
dataFrame2 = pd.DataFrame(data=dataSet2) 2 70 80 90
print("DataFrame1:");
print(dataFrame1); DataFrame2:
print("DataFrame2:"); 0 1 2
print(dataFrame2);
0 5 15 25
result = dataFrame1.sub(dataFrame2); DataFrame2:
1 35 45 55
result1=dataFrame1-dataFrame2 0 1 2
print("Result of substracting two pandas dataframes:"); 0 5 15 25
print(result) 1 35 45 55
print("Result of using - sign")
print(result1)

Vikhe Patil Memorial School Pune

import pandas as pd

#THE LISTS ARE IINITIALISED AS DATA 1 AND DATA print("Division

2 of two data frames is :")
data1 = ([[1,2,3,4,5],[10,11,12,13,14]]) print(df_1.div(df_2,)) #DIVISION of two data frames
data2 = ([[2,3,4,5,6]]) print('')
#DATA FRAME FOR LIST 1
df_1= pd.DataFrame(data1) print("Modulus of two data frames is :")
print("Data frame1 is : ") #PRINT THE DATA FRAME print(df_1.mod(df_2,)) #MODULUS of two data frames
print(df_1) print('')
print('')
#DATA FRAME FOR LIST 2 print("We can fill values for NaN using fill_value arguments.")
df_2= pd.DataFrame(data2) print("Addition of two data frames using fill_value arguments
print("Data frame2 is : ") #PRINT THE DATA FRAME is:")
print(df_2) print(df_1.add(df_2, fill_value = 0)) #FILLING VALUES FOR
print('') NaN as 0
# We can use any suitable value as fill_value for performing
print("Addition of two data frames is :") #binary functions.
print(df_1.add(df_2)) #ADDITION OF THE DATA FRAME print("Subtraction of two data frames is :")
print('') sub = df_1 - df_2 #Subtraction of two data frames
print("Multiplication of two data frames is :") print(sub)
print(df_1.mul(df_2,)) #Multiplication of two data frames print('')
print('')

Vikhe Patil Memorial School Pune

Descriptive Statistics
with Pandas
Calculating Maximum Values Calculating Quartile
Calculating Minimum Values Calculating Variance
Calculating Sum of Values
Calculating Standard Deviation
Calculating Number of Values
Aggregate functions are max(),min(),
Calculating Mean
sum(), count(), std(), var()
Calculating Median
Sorting a DataFrame
Calculating Mode
GROUP BY Functions

Vikhe Patil Memorial School Pune

Calculating Maximum Values and Minimum Values
Functions max() and min()

The max and min functions find the maximum and the minimum of the
values respectively from a given set of dataFrame.
Syntax:

<dataframe>.min(axis=None,skipna=None,numeric_only=None)
<dataframe>.max(axis=None,skipna=None,numeric_only=None)

Vikhe Patil Memorial School Pune

Parameters
axis{index (0), columns (1)}
Axis for the function to be applied on.

Skipna bool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level,
collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
Pandas DataFrame sum() Method
The sum() method adds all values in each column and returns the sum for each column.
By specifying the column axis (axis='columns'), the sum() method searches column-wise and returns the sum
of each row.

dataframe.sum(axis, skipna, level, numeric_only, min_count, kwargs)

To Sort the values
Pandas sort_values() function sorts a data frame in Ascending or Descending order of passed Column.
It’s different than the sorted Python function since it cannot sort a data frame and particular column
cannot be selected.

Syntax:
DataFrame.sort_values(by=, axis=0, ascending=True,inplace=False, kind=’quicksort’,
na_position=’last’)

by: Single/List of column names to sort Data Frame by.

axis: 0 or ‘index’ for rows and 1 or ‘columns’ for Column.
ascending: Boolean value which sorts Data frame in ascending order if True.
inplace: Boolean value. Makes the changes in passed data frame itself if True.
kind: String which can have three inputs(‘quicksort’, ‘mergesort’ or ‘heapsort’) of algorithm
used to sort data frame.
na_position: Takes two string input ‘last’ or ‘first’ to set position of Null values. Default is ‘last’.
Calculate the Mean Median and Mode
count()
This function count the non-NA entries for each row or column. The values None, NaN, NaT etc. are considered as
NA in pandas.
Syntax:
<df>.count(axis=0, numeric_only=False) e.g.

import pandas as pd
df = pd.DataFrame({"A":[-5, 8, 12, None, 5, 3], #count of non-NA value across the row axis
"B":[-1, None, 6, 4, None, 3], print(df.count(axis = 0))
"C":["sam", "haris", "alex", np.nan, "peter", "nathan"]}) output:
print(df)
DataFrame - quantile() function
The quantile() function is used to get values at the given quantile over requested
axis.

What is quantile in simple words?

In simple terms, a quantile is where a sample is
divided into equal-sized, adjacent, subgroups
(that's why it's sometimes called a “fractile“). It
can also refer to dividing a probability
distribution into areas of equal probability.
quantile()
• The word “quantile” comes from the word quantity. means, a quantile is where
a sample is divided into equal-sized or subgroups (that’s why it’s sometimes
called a “fractile“). So that’s why ,It can also refer to dividing a probability
distribution into areas of equal probability.
• The median is a kind of quantile; the median is placed in a probability
distribution at center so that exactly half of the data is lower than the median
and half of the data is above the median. The median cuts a distribution into two
equal parts and so why sometimes it is called 2-quantile.
• Quartiles are quantiles; when they divide the distribution into four equal parts.
Deciles are quantiles that divide a distribution into 10 equal parts and
Percentiles when that divide a distribution into 100 equal parts .
Common Quantiles:
Certain types of quantiles are used commonly enough to have specific names.
Below is a list of these:
• The 2 quantile is called the median • The 12 quantiles are called duodeciles
• The 3 quantiles are called terciles • The 20 quantiles are called vigintiles
• The 100 quantiles are called percentiles
• The 4 quantiles are called quartiles
• The 1000 quantiles are called permille
• The 5 quantiles are called quintiles
• The 6 quantiles are called sextiles
• The 7 quantiles are called septiles
• The 8 quantiles are called octiles
• The 10 quantiles are called deciles
Syntax of quantile():

<df>.quantile(q=0.5, axis=0, numeric_only= True)

Parameters:
q float or array-like default 0.5 (50%quantile) 0<= q<= 1, the quantile(s) to
compute like this: q=[0.25, 0.50, 0.75, 1.0]
axis {(0, 1, ‘index’, ‘columns’)}
numeric_only If False, the quantile of datetime and timedata will be
computed a well.
Steps to Find quartiles value

1. q=0.25 (0.25 quantile)

2.n = 4 (no. of elements)
= (n – 1)*q+1 3*0.25=0.75 #Program in python to find 0.25
= (4 – 1)*.25 +1 quantile of series[1, 10, 100, 1000]
= 3*.25 + 1
= 1.75
import pandas as pd
3.Now integer part is a=1 and fraction part is b=0.75
import numpy as np
and T is term.
s = pd.Series([1, 10, 100, 1000])
Now formula for quantile is:
r=s.quantile(.25)
= T1 + b*(T2 – T1) #formula will changed
print(r)
according
Term
= 1 + 0.75(10 – 1)
= 1 + 0.75*9
= 1+6.75
Calculating Variance
DataFrame.var() is used to display the variance. It is the average of squared differences from the mean.
Variance Function in python pandas is used to calculate variance of a given set of numbers, Variance of a data
frame, Variance of column and Variance of rows

import pandas as pd Dataframe contents

import numpy as np Name Age Score
0 Sachin 30 80
d 1 Dhoni 25 60
={'Name':pd.Series(['Sachin','Dhoni','Virat','Rohit','Shikha 2 Virat 25 90
r']), 3 Rohit 30 50
'Age':pd.Series([30,25,25,30,20]), 4 Shikhar 20 50
'Score':pd.Series([80,60,90,50,50])}
df = pd.DataFrame(d)
print("Dataframe contents") Age 17.5
print (df) Score 330.0
print(df.var()) dtype: float64
df.loc[:,'Age':'Score'].var() #for variance of specific column
df.var(axis=0) #column variance
df.var(axis=1) #row variance
Standard deviation

A low standard deviation means the

Standard deviation means measure values tend to be close to the mean
the amount of variation dispersion in a set and a high standard
of a set of values. deviation means the values are
spread out over a wider range.

Standard deviation is the most Finance and banking is all about

important concepts as far as finance measuring the risk and standard
is concerned. deviation measures risk.
Calculating Standard Deviation
DataFrame.std() returns the standard deviation of the values. Standard deviation is calculated as the square
root of the variance.
e.g.
import pandas as pd
import numpy as np
d = {'Name':pd.Series(['Sachin','Dhoni','Virat','Rohit','Shikhar']),
'Age':pd.Series([30,25,25,30,20]),
'Score':pd.Series([80,60,90,50,50])}
df = pd.DataFrame(d)
print("Dataframe contents")
print (df)
print(df.std())
print(df.describe())

df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

count 7.000000
mean 24.285714
std 11.700631
min 10.000000
25% 17.500000
50% 20.000000
75% 30.000000
max 45.000000
Data Aggregations
Aggregation means to transform the dataset and produce a single numeric
value from an array. Aggregation can be applied to one or more columns
together. Aggregate functions are max(),min(), sum(), count(), std(), var().
GROUP BY Functions
In pandas, DataFrame.GROUP BY() function is used to split the data into groups based on
some criteria.
Pandas objects like a DataFrame can be split on any of their axes. The GROUP BY function
works based on a split-apply-combine strategy which is shown below using a 3-step process:

Step 1: Split the data into groups by creating a GROUP BY object from the original
DataFrame.
Step 2: Apply the required function.
Step 3: Combine the results to form a new DataFrame.
In Group by operation, we spilt the data into sets and

DataFrame
apply some functionality on each subset. Once the
subset is ready, we can perform any statistical
function or discard some data with some condition.

operations:
Group by, Groupby essentially splits the data into different

Sorting groups depending on a variable of your choice.

For example, the expression data.groupby(‘city’) will

split our DataFrame containing
custcode,cname,city,telephone) by cities.
import pandas as pd
dict={'itcode':[501,502,503,504,505,506],
'itname':['chair','table','sofa','bed','cabinet','dining'],
'vendor':['Birdie','Goyalsons','Goyalsons','Tejas','Goyalsons','Birdie'],
'cost':[700,400,25000,30000,15000,18000]
}
df4=pd.DataFrame(dict)
print(df4)
DataFrame: Reindex()

 Reindex(): This function is used to changes the row labels and column labels of a DataFrame. To
reindex means to conform the data to match a given set of labels along a particular axis. Reorder the
existing data to match a new set of labels.
Reorder the existing data to match a new set of labels.
Insert missing value (NA) markers in label locations where no data for the label existed.

Syntax:
<df>.reindex(labels=None, index=None, columns=None,fill_value=nan, axis=None)

Parameters:
labels: New labels/index to conform the axis specified by ‘axis’ to. fill_value: Fill existing missing (NaN)
values, and any new element needed for successful DataFrame alignment.

Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Pandas
No ratings yet
Pandas
27 pages
Practice Questions (Unsolved)
No ratings yet
Practice Questions (Unsolved)
8 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
python interviews
No ratings yet
python interviews
154 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Practical File ANKIT RAJ CLASS 12-F
No ratings yet
Practical File ANKIT RAJ CLASS 12-F
48 pages
018) Pandas - Batch 2 - Day 018
No ratings yet
018) Pandas - Batch 2 - Day 018
35 pages
TAMIL
No ratings yet
TAMIL
9 pages
EXP-6
No ratings yet
EXP-6
9 pages
Importing Files Through Pandas
No ratings yet
Importing Files Through Pandas
16 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
B "Hello, World!" Print (B (2:5) ) Llo
No ratings yet
B "Hello, World!" Print (B (2:5) ) Llo
52 pages
PDF&Rendition=1
No ratings yet
PDF&Rendition=1
47 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Dsp Unit-5 Updated
No ratings yet
Dsp Unit-5 Updated
23 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
No ratings yet
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
65 pages
PRACTICALS
No ratings yet
PRACTICALS
52 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
Pandas 2 Complete Notes Class XII
No ratings yet
Pandas 2 Complete Notes Class XII
18 pages
Solution Manual for Introductory Statistics 10th Edition Weiss 0321989171 9780321989178 - All Chapters Are Available In PDF Format For Download
100% (5)
Solution Manual for Introductory Statistics 10th Edition Weiss 0321989171 9780321989178 - All Chapters Are Available In PDF Format For Download
52 pages
Assignments IP Class 12
No ratings yet
Assignments IP Class 12
9 pages
Dataframe
No ratings yet
Dataframe
19 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
UNIT 1 PYTHON PROGRAMMING-II
No ratings yet
UNIT 1 PYTHON PROGRAMMING-II
15 pages
02. Python Pandas - 2 2020-21
No ratings yet
02. Python Pandas - 2 2020-21
21 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Data Frame Notes1
No ratings yet
Data Frame Notes1
7 pages
Ip Project Work 2
No ratings yet
Ip Project Work 2
52 pages
a5
No ratings yet
a5
28 pages
List of Practical Ip065 Xii Session 2025 Ckc Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 Ckc Academy
19 pages
GR12 RECORD PROGRAMS 6TH ONWARDS
No ratings yet
GR12 RECORD PROGRAMS 6TH ONWARDS
18 pages
EXP-3
No ratings yet
EXP-3
10 pages
12 IP Pandas DataFrame - Question Bank
No ratings yet
12 IP Pandas DataFrame - Question Bank
10 pages
LIst of practicals 2024 - 25 class xii
No ratings yet
LIst of practicals 2024 - 25 class xii
10 pages
Chapter 2 Data Handling using pandas - I(DATA FRAME)
No ratings yet
Chapter 2 Data Handling using pandas - I(DATA FRAME)
15 pages
MCQ On Dataframe
No ratings yet
MCQ On Dataframe
11 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
64[6]
No ratings yet
64[6]
5 pages
Unit3_3) Pandas.ipynb - Colab
No ratings yet
Unit3_3) Pandas.ipynb - Colab
11 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Pandas Notes (1)
No ratings yet
Pandas Notes (1)
10 pages
Programs For Practical
No ratings yet
Programs For Practical
3 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Final-Report-Sườn-Mười
No ratings yet
Final-Report-Sườn-Mười
30 pages
python 2.1.2 (2)
No ratings yet
python 2.1.2 (2)
7 pages
A New Coefficient of Correlation (Slides) - Sourav Chatterjee
No ratings yet
A New Coefficient of Correlation (Slides) - Sourav Chatterjee
29 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Lab 9
No ratings yet
Lab 9
9 pages
BE368 Lecture 4
No ratings yet
BE368 Lecture 4
28 pages
L32, 33 Pandas
No ratings yet
L32, 33 Pandas
7 pages
AS Multiple Choices
No ratings yet
AS Multiple Choices
91 pages
TB Data Final Project
No ratings yet
TB Data Final Project
20 pages
4 5935800856413736668
No ratings yet
4 5935800856413736668
10 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Long-Term Benefits of Dapagliflozin On Renal Outcomes of Type 2 Diabetes Under Routine Care
No ratings yet
Long-Term Benefits of Dapagliflozin On Renal Outcomes of Type 2 Diabetes Under Routine Care
14 pages
Logisticregression
No ratings yet
Logisticregression
4 pages
Best Flagss
No ratings yet
Best Flagss
4 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
Parameter and Statistic
No ratings yet
Parameter and Statistic
15 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Brief Fear of Negative Evaluation Scale
No ratings yet
Brief Fear of Negative Evaluation Scale
7 pages
DF Ques1
No ratings yet
DF Ques1
2 pages
Soal Uts Statistika 2
No ratings yet
Soal Uts Statistika 2
1 page
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
100% (1)
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
37 pages
Chapter 2 : Sampling Distribution: - Sample Mean and Proportion
No ratings yet
Chapter 2 : Sampling Distribution: - Sample Mean and Proportion
18 pages
1.2.TEST of Proportions Solved Problems
No ratings yet
1.2.TEST of Proportions Solved Problems
2 pages
Notes 3 - Linear Regression
No ratings yet
Notes 3 - Linear Regression
6 pages
Assignment 1 Each One of You Are Assigned Roll No Wise 1 Question Individually That You Are Submitting
No ratings yet
Assignment 1 Each One of You Are Assigned Roll No Wise 1 Question Individually That You Are Submitting
10 pages
Probability Plot Correlation Coefficient Plot
No ratings yet
Probability Plot Correlation Coefficient Plot
2 pages
Sampling Why and How of It - Acharya.et - Al.2013
No ratings yet
Sampling Why and How of It - Acharya.et - Al.2013
5 pages
Residuals in The Extended Growth Curve Model
No ratings yet
Residuals in The Extended Growth Curve Model
18 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Analisis Univariat - Bivariat, Ci 95% Case Control
No ratings yet
Analisis Univariat - Bivariat, Ci 95% Case Control
16 pages
Topics: Descriptive Statistics and Probability: Name of Company Measure X
100% (1)
Topics: Descriptive Statistics and Probability: Name of Company Measure X
5 pages
Method Validation Training Schedule
No ratings yet
Method Validation Training Schedule
3 pages
Flight Price Predection 2
No ratings yet
Flight Price Predection 2
6 pages
Answer Key - Exercise 6
No ratings yet
Answer Key - Exercise 6
5 pages
PSUnit IV Lesson 3 Confidence Intervals For The Population Mean When Is Unknown
No ratings yet
PSUnit IV Lesson 3 Confidence Intervals For The Population Mean When Is Unknown
18 pages
Department of Education: Name: - Strand & Section - Cluster
No ratings yet
Department of Education: Name: - Strand & Section - Cluster
3 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet

Python Pandas-2

Uploaded by

Python Pandas-2

Uploaded by

Python

Vikhe Patil Memorial School Pune

Vikhe Patil Memorial School Pune

# append dFrame1 to dFrame2 >>> dFrame2

The head() import pandas as pd

The tail() function import pandas as pd

In our examples we will be using a CSV file called 'data.csv'.

Tip: use to_string() to print the entire DataFrame.

THUS WE CAN USE

There are three main ways to iterate over DataFrame:

(however we will not be studying the last one)

Vikhe Patil Memorial School Pune

Vikhe Patil Memorial School Pune

Vikhe Patil Memorial School Pune

Vikhe Patil Memorial School Pune

Vikhe Patil Memorial School Pune

#THE LISTS ARE IINITIALISED AS DATA 1 AND DATA print("Division

Vikhe Patil Memorial School Pune

Vikhe Patil Memorial School Pune

Vikhe Patil Memorial School Pune

Skipna bool, default True

levelint or level name, default None

numeric_onlybool, default None

dataframe.sum(axis, skipna, level, numeric_only, min_count, kwargs)

by: Single/List of column names to sort Data Frame by.

What is quantile in simple words?

<df>.quantile(q=0.5, axis=0, numeric_only= True)

1. q=0.25 (0.25 quantile)

import pandas as pd Dataframe contents

A low standard deviation means the

Standard deviation is the most Finance and banking is all about

df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

Sorting groups depending on a variable of your choice.

For example, the expression data.groupby(‘city’) will

You might also like