0% found this document useful (0 votes)
25 views

Python Pandas-2

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Python Pandas-2

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

Python

Pandas II
PRESENTED BY

S U S M I TA C H O L K A R

Vikhe Patil Memorial School Pune


Learning Objective
Rename ,join concat, merge functions
Iterating over a Data Frame
Binary Operation on a Data Frame
Descriptive Statistics with Pandas
Essential Functions
Advanced Operations
Handling Missing Data
Combining Data Frames
Function groupby()

Vikhe Patil Memorial School Pune


Rename columns using rename()
function
import pandas as pd
You can rename a column, dt=({'English':[74,79,48,53,68],
row, and both using rename() 'Physics':[76,78,80,76,73],
function. 'Chemistry':[57,74,55,89,70],
'Biology':[76,85,63,68,59],
You can use both indexes or 'IP':[82,93,69,98,79]})
columns or anyone as a df=pd.DataFrame(dt, index=[1201,1202,1203,1204,1205])
parameter in rename function. print("Dataframe before rename:")
print(df)
Observe the following code print("Dataframe after rename:")
illustrating rename row and
columns together: df=df.rename(columns={'English':'Eng','Physics':'Phy','Chemistry':'Chem','Biolog
y':'Bio'},
index={1201:'Akshit',1202:'Bhavin',1203:'Chetan'})
print(df)
Joining
We can use the pandas.DataFrame.append() method to merge two DataFrames. It
appends rowsof the second DataFrame at the end of the first DataFrame. Columns
not present in the first DataFrame are added as new columns.
For example, consider the two DataFrames—
dFrame1 and dFrame2described below.
Let us use the append() method to append dFrame2 to dFrame1

>>> dFrame1=pd.DataFrame([[1, 2, 3], [4, 5], >>> dFrame2=pd.DataFrame([[10, 20], [30], [40,
[6]], columns=['C1', 'C2', 'C3'], index=['R1', 'R2', 'R3']) 50]], columns=['C2', 'C5'], index=['R4', 'R2','R5'])
dFrame2

dFrame1 =dFrame1.
append(dFrame2)
dFrame1
print(dFrame1)
dFrame2=dFrame2.append(dFrame1)
dFrame2

dFrame1 dFrame2
Alternatively, if we append dFrame1 to dFrame2, the rows of
dFrame2 precede the rows of dFrame1. To get the column labels
appear in sorted order we can set the parameter sort=True. The
column labels shall appear in unsorted order when the parameter
sort = False.

# append dFrame1 to dFrame2 >>> dFrame2


=dFrame2.append(dFrame1, sort=’True’) >>> dFrame2
C1 C2 C3 C5
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
Rename columns using column
properties
import pandas as pd
You can rename a column, dt=({'English':[74,79,48,53,68],
row, and both using rename() 'Physics':[76,78,80,76,73],
function. 'Chemistry':[57,74,55,89,70],
'Biology':[76,85,63,68,59],
You can use both indexes or 'IP':[82,93,69,98,79]})
columns or anyone as a df=pd.DataFrame(dt, index=[1201,1202,1203,1204,1205])
parameter in rename function. print("Dataframe before rename:")
print(df)
Observe the following code print("Dataframe after rename:")
illustrating rename row and df.columns=['Eng','Phy','Chem','Bio','Info.Prac']
columns together: print(df)
Rename columns using index
properties
import pandas as pd
You can rename a column, dt=({'English':[74,79,48,53,68],
row, and both using rename() 'Physics':[76,78,80,76,73],
function. 'Chemistry':[57,74,55,89,70],
'Biology':[76,85,63,68,59],
You can use both indexes or 'IP':[82,93,69,98,79]})
columns or anyone as a df=pd.DataFrame(dt, index=[1201,1202,1203,1204,1205])
parameter in rename function. print("Dataframe before rename:")
print(df)
Observe the following code print("Dataframe after rename:")
illustrating rename row and df.index=['Aman','Bhavik','Chandu','Dhaval','Eshan']
columns together: print(df)
head() function

The head() import pandas as pd


function is used dt=({'English':[74,79,48,53,68,44,65,67],
'Physics':[76,78,80,76,73,55,49,60],
to retrieve top
'Chemistry':[57,74,55,89,70,50,60,80],
rows from
'Biology':[76,85,63,68,59,79,49,69],
dataframe. 'IP':[82,93,69,98,79,88,77,66]})
Have a look on df=pd.DataFrame(dt,index=[1201,1202,1203,1204,1205,1206,1
the following 207,1208])
print("All data from Dataframe:")
code:
print(df)
print(df.head(n=4))
print(df.tail())

tail() function

The tail() function import pandas as pd


is used to retrieve dt=({'English':[74,79,48,53,68,44,65,67],
'Physics':[76,78,80,76,73,55,49,60],
bottom rows from
'Chemistry':[57,74,55,89,70,50,60,80],
dataframe.
'Biology':[76,85,63,68,59,79,49,69],
Have a look on 'IP':[82,93,69,98,79,88,77,66]})
the following df=pd.DataFrame(dt,index=[1201,1202,1203,1204,1205,1206,1
code: 207,1208])
print("All data from Dataframe:")
print(df)
print(df.tail())
concat() function
The concat() function import pandas as pd
is used to join more dt_sc=({'English':[74,79,48,53,68,44,65,67],
than one datframe 'Physics':[76,78,80,76,73,55,49,60],
into one unit. You can 'Chemistry':[57,74,55,89,70,50,60,80],})
combine dataframes xii_1=pd.DataFrame(dt_sc)
having similar dt_co=({'English':[66,65,87,56,86,44,56,76],
structures. Observe
'Physics':[67,87,80,67,77,55,45,80],
this code:
'Chemistry':[75,47,55,98,70,50,60,80],})
xii_2=pd.DataFrame(dt_co)
xii=pd.concat([xii_1,xii_2])
print(xii)
concat() function
xii=pd.concat([xii_1,xii_2],ignore_index=True)
English Physics Chemistry concat() function with ignore_index=True
0 74 76 57
1 79 78 74 English Physics Chemistry
2 48 80 55 0 74 76 57
3 53 76 89 1 79 78 74
4 68 73 70 2 48 80 55
5 44 55 50 3 53 76 89
6 65 49 60 4 68 73 70
7 67 60 80 5 44 55 50
0 66 67 75 6 65 49 60
1 65 87 47 7 67 60 80
2 87 80 55 8 66 67 75
3 56 67 98 9 65 87 47
4 86 77 70 10 87 80 55
5 44 55 50 11 56 67 98
6 56 45 60 12 86 77 70
7 76 80 80 13 44 55 50
14 56 45 60
15 76 80 80
concat() function
The concat() function is import pandas as pd
used to join more than dt_sc=({'English':[74,79,48,53,68,44,65,67],
one datframe into one 'Physics':[76,78,80,76,73,55,49,60],
unit. You can combine
dataframes having
'Chemistry':[57,74,55,89,70,50,60,80],})
similar structures. The xii_1=pd.DataFrame(dt_sc)
concat() function joins dt_co=({'English':[66,65,87,56,86,44,56,76],
the dataframe along 'Physics':[67,87,80,67,77,55,45,80],
with rows. If you want 'Chemistry':[75,47,55,98,70,50,60,80],})
to join dataframes using
column you can add
xii_2=pd.DataFrame(dt_co)
axis=1 parameter in the xii=pd.concat([xii_1,xii_2],axis=1)
concat() function. print(xii)
Pandas DataFrame.merge()
Pandas merge() is defined as the process of bringing the two datasets together into
one and aligning the rows based on the common attributes or columns.
It is an entry point for all standard database join operations between DataFrame
objects:

Syntax:
pd.merge(left, right, how='inner', on=None, left_on=None,
right_on=None,left_index=False, right_index=False, sort=True)
merge() Function
It is used to merge two dataframes that p1=({'P_ID':[1,2,3,4,5],
have some common values. You can
'First_Name':['Sachin','Saurav','Virendra','Mahendra Sinh','Gautam'],
specify the fields as on parameter in the
merge() function. It follows the concept 'Last_Name':['Tendulker','Ganguly','Sehvag','Dhoni','Gambhir']})
of RDBMS having parent column and
d1=pd.DataFrame(p1)
child columns in the dataframe. One
column should have common data. p2=({'P_ID':[1,2,3,4,5],'Runs':[18987,12120,11345,10345,12789]})
Have a look at this code:
d2=pd.DataFrame(p2)

players=pd.merge(d1,d2)

print(players)
To directly import a .csv file into DataFrame
A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

In our examples we will be using a CSV file called 'data.csv'.

If you have a large DataFrame with many rows, Pandas will only return the first 5 rows, and the
last 5 rows:

Tip: use to_string() to print the entire DataFrame.


Importing data from csv to
dataframe
Writing to a CSV File
Boolean
Indexing
BOOLEAN INDEXING IS A
TYPE OF INDEXING
WHICH USES ACTUAL
VA L U E S O F T H E D ATA I N
T H E D ATA F R A M E U S I N G
BOOLEAN VECTOR

THUS WE CAN USE


CONDITIONS ON THE
BASIS OF COLUMNS
N A M E S T O F I LT E R D ATA
VA L U E S .
Boolean Indexing
Getting the count of non NaN values
Getting the number of rows
Iterating over a Data Frame

There are three main ways to iterate over DataFrame:

iterrows()

items()

itertuples()

(however we will not be studying the last one)

Vikhe Patil Memorial School Pune


import pandas as pd
ResultSheet={'Arnab': pd.Series([90, 91, 97],index=['Maths','Science','Hindi']),
'Ramit': pd.Series([92, 81, 96],index=['Maths','Science','Hindi']),
'Samridhi': pd.Series([89, 91, 88],index=['Maths','Science','Hindi']),
'Riya': pd.Series([81, 71, 67],index=['Maths','Science','Hindi']),
'Mallika': pd.Series([94, 95, 99],index=['Maths','Science','Hindi'])}
r=pd.DataFrame(ResultSheet)
print(r) Series
for (row,rs) in r.iterrows():
print('Row Index',row)
print(rs) Row Index
Iterating DataFrames with iterrows()
Arnab Ramit Samridhi Riya Mallika Row Index Maths
Maths 90 92 89 81 94 Arnab 90
import pandas as pd Ramit 92
Science 91 81 91 71 95
ResultSheet={'Arnab': pd.Series([90, 91, 97],index=['Maths','Science','Hindi']),
Hindi 97 96 88 67 99 Samridhi 89
Riya 81
'Ramit': pd.Series([92, 81, 96],index=['Maths','Science','Hindi']), Mallika 94
Name: Maths, dtype: int64
'Samridhi': pd.Series([89, 91, 88],index=['Maths','Science','Hindi']), Row Index Science
'Riya': pd.Series([81, 71, 67],index=['Maths','Science','Hindi']), Arnab 91
Ramit 81
'Mallika': pd.Series([94, 95, 99],index=['Maths','Science','Hindi'])} Samridhi 91
Riya 71
r=pd.DataFrame(ResultSheet) Mallika 95
print(r) Name: Science, dtype: int64
The iterrows method iterates Row Index Hindi
for (row,rs) in r.iterrows(): over the rows in the form of Arnab 97
rowindex Series where the Ramit 96
print('Row Index',row) Samridhi 88
print(rs) series consists of the column Riya 67
values of the row Mallika 99
Name: Hindi, dtype: int64

Vikhe Patil Memorial School Pune


Iterating DataFrames with items()
col_name: first_name
import pandas as pd first_name last_name age data: id001 John
id001 John Smith 34 id002 Jane
df = pd.DataFrame({ id003 Marry
id002 Jane Doe 29
id004 Victoria
'first_name': ['John', 'Jane', 'Marry', 'Victoria', 'Gabriel', 'Layla'], id003 Marry Jackson 37 id005 Gabriel
'last_name': ['Smith', 'Doe', 'Jackson', 'Smith', 'Brown', 'Martinez'], id004 Victoria Smith 52 id006 Layla
id005 Gabriel Brown 26 Name: first_name, dtype: object
col_name: last_name
'age': [34, 29, 37, 52, 26, 32]}, id006 Layla Martinez 32 data: id001 Smith
index=['id001', 'id002', 'id003', 'id004', 'id005', 'id006']) id002 Doe
id003 Jackson
print(df) id004 Smith
id005 Brown
for col_name, data in df.items(): pairs of col_name and data. id006 Martinez
Name: last_name, dtype: object
print("col_name:",col_name, "\ndata:",data) These pairs will contain a col_name: age
data: id001 34
column name and every id002 29
row of data for that column. id003 37
id004 52
id005 26
id006 32
Name: age, dtype: int64

Vikhe Patil Memorial School Pune


Binary Operations In DataFrames
import pandas as pd
Result of adding two pandas dataframes:
dataSet1 = [(10, 20, 30), (40, 50, 60), (70, 80, 90)]; DataFrame1: 0 1 2
0 1 2 0 15.0 35.0 55.0
dataFrame1 = pd.DataFrame(data=dataSet1)
0 10 20 30 1 75.0 95.0 115.0
dataSet2 = [(5, 15, 25), (35, 45, 55),] ];
1 40 50 60 2 NaN NaN NaN
dataFrame2 = pd.DataFrame(data=dataSet2) 2 70 80 90
print("DataFrame1:");
print(dataFrame1);
print("DataFrame2:"); DataFrame2:
print(dataFrame2); 0 1 2
result = dataFrame1.add(dataFrame2); 0 5 15 25
0 1 2
result1=dataFrame1+dataFrame2 1 35 45 55
0 15.0 35.0 55.0
print("Result of adding two pandas dataframes:");
1 75.0 95.0 115.0
print(result) 2 NaN NaN NaN
print(result1)

Vikhe Patil Memorial School Pune


Binary Operations In DataFrames
import pandas as pd
dataSet1 = [(10, 20, 30),
Result of substracting two pandas
(50, 50, 60),
DataFrame1: dataframes:
(70, 80, 90)]; 0 1 2 DataFrame1:
dataFrame1 = pd.DataFrame(data=dataSet1) 0 10 20 30 0 1 2
dataSet2 = [(5, 15, 25), 1 50 50 60 0 10 20 30
(35, 45, 55)] 2 70 80 90 1 50 50 60
dataFrame2 = pd.DataFrame(data=dataSet2) 2 70 80 90
print("DataFrame1:");
print(dataFrame1); DataFrame2:
print("DataFrame2:"); 0 1 2
print(dataFrame2);
0 5 15 25
result = dataFrame1.sub(dataFrame2); DataFrame2:
1 35 45 55
result1=dataFrame1-dataFrame2 0 1 2
print("Result of substracting two pandas dataframes:"); 0 5 15 25
print(result) 1 35 45 55
print("Result of using - sign")
print(result1)

Vikhe Patil Memorial School Pune


import pandas as pd

#THE LISTS ARE IINITIALISED AS DATA 1 AND DATA print("Division


2 of two data frames is :")
data1 = ([[1,2,3,4,5],[10,11,12,13,14]]) print(df_1.div(df_2,)) #DIVISION of two data frames
data2 = ([[2,3,4,5,6]]) print('')
#DATA FRAME FOR LIST 1
df_1= pd.DataFrame(data1) print("Modulus of two data frames is :")
print("Data frame1 is : ") #PRINT THE DATA FRAME print(df_1.mod(df_2,)) #MODULUS of two data frames
print(df_1) print('')
print('')
#DATA FRAME FOR LIST 2 print("We can fill values for NaN using fill_value arguments.")
df_2= pd.DataFrame(data2) print("Addition of two data frames using fill_value arguments
print("Data frame2 is : ") #PRINT THE DATA FRAME is:")
print(df_2) print(df_1.add(df_2, fill_value = 0)) #FILLING VALUES FOR
print('') NaN as 0
# We can use any suitable value as fill_value for performing
print("Addition of two data frames is :") #binary functions.
print(df_1.add(df_2)) #ADDITION OF THE DATA FRAME print("Subtraction of two data frames is :")
print('') sub = df_1 - df_2 #Subtraction of two data frames
print("Multiplication of two data frames is :") print(sub)
print(df_1.mul(df_2,)) #Multiplication of two data frames print('')
print('')

Vikhe Patil Memorial School Pune


Descriptive Statistics
with Pandas
Calculating Maximum Values Calculating Quartile
Calculating Minimum Values Calculating Variance
Calculating Sum of Values
Calculating Standard Deviation
Calculating Number of Values
Aggregate functions are max(),min(),
Calculating Mean
sum(), count(), std(), var()
Calculating Median
Sorting a DataFrame
Calculating Mode
GROUP BY Functions

Vikhe Patil Memorial School Pune


Calculating Maximum Values and Minimum Values
Functions max() and min()

The max and min functions find the maximum and the minimum of the
values respectively from a given set of dataFrame.
Syntax:

<dataframe>.min(axis=None,skipna=None,numeric_only=None)
<dataframe>.max(axis=None,skipna=None,numeric_only=None)

Vikhe Patil Memorial School Pune


Parameters
axis{index (0), columns (1)}
Axis for the function to be applied on.

Skipna bool, default True


Exclude NA/null values when computing the result.

levelint or level name, default None


If the axis is a MultiIndex (hierarchical), count along a particular level,
collapsing into a Series.

numeric_onlybool, default None


Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
Pandas DataFrame sum() Method
The sum() method adds all values in each column and returns the sum for each column.
By specifying the column axis (axis='columns'), the sum() method searches column-wise and returns the sum
of each row.

dataframe.sum(axis, skipna, level, numeric_only, min_count, kwargs)


To Sort the values
Pandas sort_values() function sorts a data frame in Ascending or Descending order of passed Column.
It’s different than the sorted Python function since it cannot sort a data frame and particular column
cannot be selected.

Syntax:
DataFrame.sort_values(by=, axis=0, ascending=True,inplace=False, kind=’quicksort’,
na_position=’last’)

by: Single/List of column names to sort Data Frame by.


axis: 0 or ‘index’ for rows and 1 or ‘columns’ for Column.
ascending: Boolean value which sorts Data frame in ascending order if True.
inplace: Boolean value. Makes the changes in passed data frame itself if True.
kind: String which can have three inputs(‘quicksort’, ‘mergesort’ or ‘heapsort’) of algorithm
used to sort data frame.
na_position: Takes two string input ‘last’ or ‘first’ to set position of Null values. Default is ‘last’.
Calculate the Mean Median and Mode
count()
This function count the non-NA entries for each row or column. The values None, NaN, NaT etc. are considered as
NA in pandas.
Syntax:
<df>.count(axis=0, numeric_only=False) e.g.

import pandas as pd
df = pd.DataFrame({"A":[-5, 8, 12, None, 5, 3], #count of non-NA value across the row axis
"B":[-1, None, 6, 4, None, 3], print(df.count(axis = 0))
"C":["sam", "haris", "alex", np.nan, "peter", "nathan"]}) output:
print(df)
DataFrame - quantile() function
The quantile() function is used to get values at the given quantile over requested
axis.

What is quantile in simple words?


In simple terms, a quantile is where a sample is
divided into equal-sized, adjacent, subgroups
(that's why it's sometimes called a “fractile“). It
can also refer to dividing a probability
distribution into areas of equal probability.
quantile()
• The word “quantile” comes from the word quantity. means, a quantile is where
a sample is divided into equal-sized or subgroups (that’s why it’s sometimes
called a “fractile“). So that’s why ,It can also refer to dividing a probability
distribution into areas of equal probability.
• The median is a kind of quantile; the median is placed in a probability
distribution at center so that exactly half of the data is lower than the median
and half of the data is above the median. The median cuts a distribution into two
equal parts and so why sometimes it is called 2-quantile.
• Quartiles are quantiles; when they divide the distribution into four equal parts.
Deciles are quantiles that divide a distribution into 10 equal parts and
Percentiles when that divide a distribution into 100 equal parts .
Common Quantiles:
Certain types of quantiles are used commonly enough to have specific names.
Below is a list of these:
• The 2 quantile is called the median • The 12 quantiles are called duodeciles
• The 3 quantiles are called terciles • The 20 quantiles are called vigintiles
• The 100 quantiles are called percentiles
• The 4 quantiles are called quartiles
• The 1000 quantiles are called permille
• The 5 quantiles are called quintiles
• The 6 quantiles are called sextiles
• The 7 quantiles are called septiles
• The 8 quantiles are called octiles
• The 10 quantiles are called deciles
Syntax of quantile():

<df>.quantile(q=0.5, axis=0, numeric_only= True)


Parameters:
q float or array-like default 0.5 (50%quantile) 0<= q<= 1, the quantile(s) to
compute like this: q=[0.25, 0.50, 0.75, 1.0]
axis {(0, 1, ‘index’, ‘columns’)}
numeric_only If False, the quantile of datetime and timedata will be
computed a well.
Steps to Find quartiles value

1. q=0.25 (0.25 quantile)


2.n = 4 (no. of elements)
= (n – 1)*q+1 3*0.25=0.75 #Program in python to find 0.25
= (4 – 1)*.25 +1 quantile of series[1, 10, 100, 1000]
= 3*.25 + 1
= 1.75
import pandas as pd
3.Now integer part is a=1 and fraction part is b=0.75
import numpy as np
and T is term.
s = pd.Series([1, 10, 100, 1000])
Now formula for quantile is:
r=s.quantile(.25)
= T1 + b*(T2 – T1) #formula will changed
print(r)
according
Term
= 1 + 0.75(10 – 1)
= 1 + 0.75*9
= 1+6.75
Calculating Variance
DataFrame.var() is used to display the variance. It is the average of squared differences from the mean.
Variance Function in python pandas is used to calculate variance of a given set of numbers, Variance of a data
frame, Variance of column and Variance of rows

import pandas as pd Dataframe contents


import numpy as np Name Age Score
0 Sachin 30 80
d 1 Dhoni 25 60
={'Name':pd.Series(['Sachin','Dhoni','Virat','Rohit','Shikha 2 Virat 25 90
r']), 3 Rohit 30 50
'Age':pd.Series([30,25,25,30,20]), 4 Shikhar 20 50
'Score':pd.Series([80,60,90,50,50])}
df = pd.DataFrame(d)
print("Dataframe contents") Age 17.5
print (df) Score 330.0
print(df.var()) dtype: float64
df.loc[:,'Age':'Score'].var() #for variance of specific column
df.var(axis=0) #column variance
df.var(axis=1) #row variance
Standard deviation

A low standard deviation means the


Standard deviation means measure values tend to be close to the mean
the amount of variation dispersion in a set and a high standard
of a set of values. deviation means the values are
spread out over a wider range.

Standard deviation is the most Finance and banking is all about


important concepts as far as finance measuring the risk and standard
is concerned. deviation measures risk.
Calculating Standard Deviation
DataFrame.std() returns the standard deviation of the values. Standard deviation is calculated as the square
root of the variance.
e.g.
import pandas as pd
import numpy as np
d = {'Name':pd.Series(['Sachin','Dhoni','Virat','Rohit','Shikhar']),
'Age':pd.Series([30,25,25,30,20]),
'Score':pd.Series([80,60,90,50,50])}
df = pd.DataFrame(d)
print("Dataframe contents")
print (df)
print(df.std())
print(df.describe())

df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

count 7.000000
mean 24.285714
std 11.700631
min 10.000000
25% 17.500000
50% 20.000000
75% 30.000000
max 45.000000
Data Aggregations
Aggregation means to transform the dataset and produce a single numeric
value from an array. Aggregation can be applied to one or more columns
together. Aggregate functions are max(),min(), sum(), count(), std(), var().
GROUP BY Functions
In pandas, DataFrame.GROUP BY() function is used to split the data into groups based on
some criteria.
Pandas objects like a DataFrame can be split on any of their axes. The GROUP BY function
works based on a split-apply-combine strategy which is shown below using a 3-step process:

Step 1: Split the data into groups by creating a GROUP BY object from the original
DataFrame.
Step 2: Apply the required function.
Step 3: Combine the results to form a new DataFrame.
In Group by operation, we spilt the data into sets and

DataFrame
apply some functionality on each subset. Once the
subset is ready, we can perform any statistical
function or discard some data with some condition.

operations:
Group by, Groupby essentially splits the data into different

Sorting groups depending on a variable of your choice.

For example, the expression data.groupby(‘city’) will


split our DataFrame containing
custcode,cname,city,telephone) by cities.
import pandas as pd
dict={'itcode':[501,502,503,504,505,506],
'itname':['chair','table','sofa','bed','cabinet','dining'],
'vendor':['Birdie','Goyalsons','Goyalsons','Tejas','Goyalsons','Birdie'],
'cost':[700,400,25000,30000,15000,18000]
}
df4=pd.DataFrame(dict)
print(df4)
DataFrame: Reindex()

 Reindex(): This function is used to changes the row labels and column labels of a DataFrame. To
reindex means to conform the data to match a given set of labels along a particular axis. Reorder the
existing data to match a new set of labels.
Reorder the existing data to match a new set of labels.
Insert missing value (NA) markers in label locations where no data for the label existed.

Syntax:
<df>.reindex(labels=None, index=None, columns=None,fill_value=nan, axis=None)

Parameters:
labels: New labels/index to conform the axis specified by ‘axis’ to. fill_value: Fill existing missing (NaN)
values, and any new element needed for successful DataFrame alignment.

You might also like