0% found this document useful (0 votes)
8 views5 pages

Multi Index

The document provides a comprehensive overview of using MultiIndex in pandas for organizing and manipulating multi-dimensional data. It includes examples of creating MultiIndex Series and DataFrames, slicing, stacking, unstacking, and sorting data. Additionally, it discusses the transformation of wide data to long format using the melt function and demonstrates merging datasets.

Uploaded by

gstupdate.rkc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

Multi Index

The document provides a comprehensive overview of using MultiIndex in pandas for organizing and manipulating multi-dimensional data. It includes examples of creating MultiIndex Series and DataFrames, slicing, stacking, unstacking, and sorting data. Additionally, it discusses the transformation of wide data to long format using the melt function and demonstrates merging datasets.

Uploaded by

gstupdate.rkc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 5

index_val = [('cse',2019),('cse',2020),('cse',2021),('cse',2022),('ece',2019),('ece',2020),('ece',2021),('ece',2022)]

multi = pd.MultiIndex.from_tuples(index_val)

# 1st list is level 1 and year is level 2


multi = pd.MultiIndex.from_product([['cse','ece'],[2019,2020,2021,2022]])

multi.levels

sr = pd.Series([1,2,3,4,5,6,7,8], index = multi)

# Slicing
sr['cse']
sr[('cse',2021)]

# unstack

unstack = sr.unstack()
stack = unstack.stack()

# biggest reason to use Multiindex Series


# 3D data to 2D Data or 2D to 1D

# Mulitidex Dataframe

list_data = [[1,2],[3,4],[5,6],[7,8],[9,10],[11,12],[13,14],[15,16]]

branchdf = pd.DataFrame(list_data ,index= multi, columns=['avg_package','students'])

branchdf.loc['cse']
branchdf.loc['cse',2019]

# for pandas index and col is the same

branch_df3 = pd.DataFrame(
[
[1,2,0,0],[3,4,0,0],[5,6,0,0],[7,8,0,0],[9,10,0,0],[11,12,0,0],[13,14,0,0],[15,16,0,0],
],
index = pd.MultiIndex.from_product([['cse','ece'],[2019,2020,2021,2022]]),
columns = pd.MultiIndex.from_product([['delhi','mumbai'],['avg_package','students']]))

# Slicing

branch_df3['delhi'].loc['cse']
branch_df3['delhi','avg_package'].loc['cse',2020]
# .loc[("CA", "Dustinmouth"), ("Services", "Schools")]

branch_df3['delhi']['avg_package']
branch_df3['delhi','avg_package']
branch_df3.iloc[[0,4],[0,2,1]]

branch_df3[[('delhi','avg_package'),('mumbai','avg_package')]]

# stacking and unstacking

# index will move to col i.e. year will be converted to colunm

unstack1 = branch_df3.unstack()

# all the col will move to row


stacked_3 = unstack1.stack().stack().stack()

unstaked_3 = stacked_3.unstack().unstack().unstack().unstack()

## Basic features
branch_df3.shape
stacked_3.shape
unstaked_3.shape

branch_df3.info()
branch_df3.describe()

# GETTING LEVEL

branch_df3.index.get_level_values(0)
branch_df3.index.get_level_values(1)
unstaked_3.index.get_level_values(3)
unstaked_3.index.get_level_values(0)

branch_df3.columns.names = ['Catagory','sub-Catagory']
branch_df3.index.names = ['Catagory','year']

# SORTING
branch_df3.sort_index(ascending=False)
#

branch_df3.sort_index(ascending=[False,True])
branch_df3.sort_index(level = ['Catagory','year'] , ascending=[True,False])
branch_df3.sort_index(level = 1 , ascending = False)

branch_df3.sort_index(level = 0 , ascending = False,axis = 1)


# Transpose

branch_df3.transpose()
branch_df3.swaplevel()
branch_df3.swaplevel(axis = 1)

## Long Vs Wide Data

pd.DataFrame({'branch':['cse','ece','mech'],
'2020':[100,150,60],'2021':[120,130,80],'2022':[150,140,70] } ).melt(id_vars = ['branch'])

# Melt - it will take only col name and convert that col to index
#eg date in covid data was in col and we changed it to row
# this reduced the num of col from 311253 to 6
path = r'C:\Users\rkcas\Desktop\datasets\datasets-session-21\time_series_covid19_confirmed_global.csv'
path_death = r'C:\Users\rkcas\Desktop\datasets\datasets-session-21\time_series_covid19_deaths_global.csv'
confirmed = pd.read_csv(path)
dealth = pd.read_csv(path_death)

confirmed = confirmed.melt(id_vars = ['Province/State','Country/Region','Lat','Long'],var_name = 'date',value_name


dealth = dealth.melt(id_vars = ['Province/State','Country/Region','Lat','Long'],var_name = 'date',value_name = 'num

a = confirmed.merge(dealth,on = ['Province/State','Country/Region','Lat','Long','date'],how = 'left')

a['date'].between()
('ece',2021),('ece',2022)]
confirmed_global.csv'
vid19_deaths_global.csv'

var_name = 'date',value_name = 'number of cases')


me = 'date',value_name = 'number of dealth')

],how = 'left')

You might also like