0% found this document useful (0 votes)

31 views9 pages

Unit 4 DSE

Uploaded by

g.mahalakshmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views9 pages

Unit 4 DSE

Uploaded by

g.mahalakshmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

HIERARCHICAL INDEXING

Hierarchical indexing (also known as multi-indexing) to incorporate multiple index levels within
a single index

We begin with the standard imports:

import pandas as pd

import numpy as np

A Multiply Indexed Series

The bad way
Suppose you would like to track data about states from two different years. Using the Pandas
tools we’ve already covered, you might be tempted to simply use Python tuples as keys:

In[2]: index = [('California', 2000), ('California', 2010), ('New York', 2000), ('New York',
2010), ('Texas', 2000), ('Texas', 2010)]

populations = [33871648, 37253956, 18976457, 19378102, 20851820, 25145561]

pop = pd.Series(populations, index=index)

pop

Out[2]: (California, 2000) 33871648

(California, 2010) 37253956

(New York, 2000) 18976457

(New York, 2010) 19378102

(Texas, 2000) 20851820

(Texas, 2010) 25145561

dtype: int64

Methods of MultiIndex Creation

The most straightforward way to construct a multiply indexed Series or DataFrame is to simply
pass a list of two or more index arrays to the constructor.
For example:

In[12]: df = pd.DataFrame(np.random.rand(4, 2),

index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],

columns=['data1', 'data2'])

Out[12]:

data1 data2

a 1 0.554233 0.356072

2 0.925244 0.219474

b 1 0.441759 0.610054

2 0.171495 0.886688

if you pass a dictionary with appropriate tuples as keys, Pandas will auto‐ matically recognize
this and use a MultiIndex by default:

In[13]: data = {('California', 2000): 33871648,

('California', 2010): 37253956,

('Texas', 2000): 20851820,

('Texas', 2010): 25145561,

('New York', 2000): 18976457,

('New York', 2010): 19378102}

pd.Series(data)

Out[13]: California 2000 33871648

2010 37253956

New York 2000 18976457

2010 19378102

Texas 2000 20851820

2010 25145561

dtype: int64

Concatenation of NumPy Arrays

concatenation of NumPy arrays, which can be done via the np.concatenate function

In[4]: x = [1, 2, 3]

y = [4, 5, 6]

z = [7, 8, 9]

np.concatenate([x, y, z])

Out[4]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

The first argument is a list or tuple of arrays to concatenate. Additionally, it takes an axis
keyword that allows you to specify the axis along which the result will be concatenated:

In[5]: x= [[1, 2],

[3, 4]]

np.concatenate([x, x], axis=1)

Out[5]: array([[1, 2, 1, 2],

[3, 4, 3, 4]])

Simple Concatenation with pd.concat

Pandas has a function, pd.concat(), which has a similar syntax to np.concatenate but contains a
number of options that we’ll discuss
# Signature in Pandas v0.18

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,

keys=None, levels=None, names=None, verify_integrity=False, copy=True)

pd.concat() can be used for a simple concatenation of Series or DataFrame objects,

just as np.concatenate() can be used for simple concatenations of arrays:

In[6]: ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3])

ser2 = pd.Series(['D', 'E', 'F'], index=[4, 5, 6])

pd.concat([ser1, ser2])

Out[6]: 1A

dtype: object

It also works to concatenate higher-dimensional objects, such as DataFrames:

In[7]: df1 = make_df('AB', [1, 2])

df2 = make_df('AB', [3, 4])

print(df1); print(df2); print(pd.concat([df1, df2]))

df1 df2 pd.concat([df1, df2])

A B A B A B

1 A1 B1 3 A3 B3 1 A1 B1

2 A2 B2 4 A4 B4 2 A2 B2

3 A3 B3

4 A4 B4

By default, the concatenation takes place row-wise within the DataFrame (i.e.,axis=0). Like
np.concatenate, pd.concat allows specification of an axis along which concatenation will take
place. Consider the following example:

In[8]: df3 = make_df('AB', [0, 1])

df4 = make_df('CD', [0, 1])

print(df3); print(df4); print(pd.concat([df3, df4], axis='col'))

df3 df4 pd.concat([df3, df4], axis='col')

A B C D A B C D

0 A0 B0 0 C0 D0 0 A0 B0 C0 D0

1 A1 B1 1 C1 D1 1 A1 B1 C1 D1

We could have equivalently specified axis=1; here we’ve used the more intuitive axis='col'.

1. Concatenate (concat)

The concat function in pandas is used to combine two or more DataFrame objects along a
particular axis (either rows or columns). It’s more flexible than append and supports additional
operations such as handling missing data.

Syntax:

import pandas as pd

# Concatenating along rows (axis=0) or columns (axis=1)

result = pd.concat([df1, df2], axis=0)

Key Parameters:

 axis: Determines the axis to concatenate along. axis=0 (default) concatenates along rows,
while axis=1 concatenates along columns.
 join: Specifies how to handle indexes (like a SQL join). Options are outer (default) and
inner.
o outer join: Includes all rows/columns from both datasets and fills in missing
values with NaNs.
o inner join: Only includes rows/columns with matching labels.
 ignore_index: If True, it reindexes the resulting DataFrame.

Example:

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenate along rows

result = pd.concat([df1, df2], axis=0, ignore_index=True)

2. Append
The append method is specifically for adding one DataFrame to the end of another along rows.
It's essentially a shortcut for pd.concat with axis=0.

Syntax:

# Appending df2 to df1

result = df1.append(df2, ignore_index=True)

Key Parameters:

 ignore_index: If True, it reindexes the resulting DataFrame.

Example:

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Append df2 to df1

result = df1.append(df2, ignore_index=True)

1. Merge

The merge function is the most versatile way to combine two DataFrame objects in pandas. It
provides various options for specifying how to align the datasets based on common columns or
indexes.

Syntax:

import pandas as pd

# Merging on a specific column

result = pd.merge(df1, df2, on='key_column')

Key Parameters:

 on:Specifies the column or index level names to join on. If not specified, merge will use
overlapping column names.
 how: Specifies the type of join to perform. Options are:
o inner (default): Only includes rows with keys present in both datasets.
o outer: Includes all rows from both datasets and fills in missing values with NaNs.
o left: Includes all rows from the left dataset and matching rows from the right
dataset.
o right: Includes all rows from the right dataset and matching rows from the left
dataset.
 left_on and right_on: Specifies column(s) from the left and right DataFrame to join on, if they
differ.
 suffixes: Specifies suffixes to append to overlapping column names from the left and right
DataFrame.

Example:

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})

df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [24, 25, 26]})

# Merge based on the 'ID' column, with an inner join

result = pd.merge(df1, df2, on='ID', how='inner')

2. Join

The join method is a simplified way to merge two DataFrame objects based on their indexes.
It is convenient for combining datasets where one or both of the datasets use an index as the key
for alignment.

Syntax:

# Joining two DataFrames on their indexes

result = df1.join(df2)

Key Parameters:

 how: Specifies the type of join (left, right, outer, or inner), similar to merge.
 on: Specifies a column to join on, useful if the DataFrame has a different index and you
want to join based on a column.
 lsuffix and rsuffix: Specifies suffixes to append to overlapping column names in the left
and right DataFrame.

Example:

df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie']}, index=[1, 2, 3])

df2 = pd.DataFrame({'Age': [24, 25, 26]}, index=[1, 2, 4])

# Join based on the index with an outer join

result = df1.join(df2, how='outer')

Hierarchical Indexes
Hierarchical Indexes are also known as multi-indexing is setting more than one column
name as the index. In this article, we are going to use homelessness.csv file.
# importing pandas library as alias pd

import pandas as pd

# calling the pandas read_csv() function.

# and storing the result in DataFrame df

df = pd.read_csv('homelessness.csv')

print(df.head())

Hierarchical Indexing in pandas:

# using the pandas set_index() function.

df_ind3 = df.set_index(['region', 'state', 'individuals'])

# we can sort the data by using sort_index()

df_ind3.sort_index()

print(df_ind3.head(10))

Selecting Data in a Hierarchical Index or using the Hierarchical Indexing:

For selecting the data from the dataframe using the .loc() method we have to pass the name of
the indexes in a list.
# selecting the 'Pacific' and 'Mountain'

# region from the dataframe.

# selecting data using level(0) index or main index.

df_ind3_region = df_ind3.loc[['Pacific', 'Mountain']]

print(df_ind3_region.head(10))

Maths Sample Papers XII
No ratings yet
Maths Sample Papers XII
111 pages
Kinetic AppStudioExtensionsUserGuide
No ratings yet
Kinetic AppStudioExtensionsUserGuide
144 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Shop Manual PC27MRX1 PC30MRX1 PC35MRX1 PC40MRX1 PC45MRX1
No ratings yet
Shop Manual PC27MRX1 PC30MRX1 PC35MRX1 PC40MRX1 PC45MRX1
946 pages
Mind Map
100% (1)
Mind Map
13 pages
Python Unit Iv - Pandas
No ratings yet
Python Unit Iv - Pandas
36 pages
Data Science Data Manipulation With Pandas
No ratings yet
Data Science Data Manipulation With Pandas
77 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Injectors. Adaptations. Coding - Bimmerprofs - Com - NOx Emulator NOXEM 129 - 130 - 402 Developed For BMW N43 & N53 Series Engines
No ratings yet
Injectors. Adaptations. Coding - Bimmerprofs - Com - NOx Emulator NOXEM 129 - 130 - 402 Developed For BMW N43 & N53 Series Engines
27 pages
OOM Unit 2
No ratings yet
OOM Unit 2
145 pages
Sim of Tyre Rolling Resistance Final Rev
No ratings yet
Sim of Tyre Rolling Resistance Final Rev
26 pages
07 Data Wrangling
No ratings yet
07 Data Wrangling
51 pages
SOAv 1
No ratings yet
SOAv 1
50 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Python Unit 3 4
No ratings yet
Python Unit 3 4
92 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Pandas
No ratings yet
Pandas
13 pages
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Pandas
No ratings yet
Pandas
26 pages
Ch8 Data Wrangling Join, Combine, and Reshape
No ratings yet
Ch8 Data Wrangling Join, Combine, and Reshape
13 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Pandas
No ratings yet
Pandas
94 pages
Data Wrangling and Analysis
100% (1)
Data Wrangling and Analysis
36 pages
Mathematical Literacy P2 Feb-March 2011 Memo Eng
No ratings yet
Mathematical Literacy P2 Feb-March 2011 Memo Eng
23 pages
DSP Unit-5 Updated
No ratings yet
DSP Unit-5 Updated
23 pages
Python Lecture 5 (2025)
No ratings yet
Python Lecture 5 (2025)
29 pages
From Arrays From Tuples From Product From Levels and Codes
No ratings yet
From Arrays From Tuples From Product From Levels and Codes
22 pages
UNIT IV Material
No ratings yet
UNIT IV Material
23 pages
Pandas
No ratings yet
Pandas
44 pages
Unit 3
No ratings yet
Unit 3
10 pages
Lecture 8 - Data Wrangling Using Pandas
No ratings yet
Lecture 8 - Data Wrangling Using Pandas
31 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
Notes For Python Part III
No ratings yet
Notes For Python Part III
44 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Edp 3
No ratings yet
Edp 3
16 pages
Python Pandas Dataframe: Parameter & Description
No ratings yet
Python Pandas Dataframe: Parameter & Description
12 pages
KODAG
No ratings yet
KODAG
24 pages
Exp 6
No ratings yet
Exp 6
9 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Exp 3
No ratings yet
Exp 3
10 pages
Combining Datasets
No ratings yet
Combining Datasets
36 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
7 pages
Numerical Methods L3 Ok
No ratings yet
Numerical Methods L3 Ok
28 pages
Merge, Join, and Concatenate: Concatenating Objects
No ratings yet
Merge, Join, and Concatenate: Concatenating Objects
62 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Pandas 1
No ratings yet
Pandas 1
6 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
White Paper Droplet Based Microfluidics Elveflow Microfluidics
No ratings yet
White Paper Droplet Based Microfluidics Elveflow Microfluidics
28 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Merge, Join, and Concatenate - Pandas 0203 Documentation
No ratings yet
Merge, Join, and Concatenate - Pandas 0203 Documentation
31 pages
Wrangling 1
No ratings yet
Wrangling 1
5 pages
SD Mill
No ratings yet
SD Mill
10 pages
Pump Minimum Continuous Stable Flow (MCSF)
No ratings yet
Pump Minimum Continuous Stable Flow (MCSF)
6 pages
UnitIV 1
No ratings yet
UnitIV 1
4 pages
Refrigeration Unit Datasheet
No ratings yet
Refrigeration Unit Datasheet
8 pages
Chemical Resistance Guide
No ratings yet
Chemical Resistance Guide
20 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Field Training Report: Executive Engineer
No ratings yet
Field Training Report: Executive Engineer
19 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Python Modules
No ratings yet
Python Modules
14 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
Summative On Measure of An Arc
No ratings yet
Summative On Measure of An Arc
1 page
Immobilization of Enzymes
No ratings yet
Immobilization of Enzymes
21 pages
Multivariate Laplace Distribution
No ratings yet
Multivariate Laplace Distribution
3 pages
11 B)
No ratings yet
11 B)
1 page
Trs en
No ratings yet
Trs en
2 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Xaliss Jamal Omer - Numerical
No ratings yet
Xaliss Jamal Omer - Numerical
16 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Python - Pandas Merging, Joining, and Concatenating
No ratings yet
Python - Pandas Merging, Joining, and Concatenating
1 page
Objective: SQL Server 6.5
No ratings yet
Objective: SQL Server 6.5
24 pages
AASHTO T-265 Moisture Content of Soils PDF
No ratings yet
AASHTO T-265 Moisture Content of Soils PDF
12 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Ch-2 - Panda - Part-1 - 2nd - Day
No ratings yet
Ch-2 - Panda - Part-1 - 2nd - Day
4 pages
Fundamental Counting Principle
No ratings yet
Fundamental Counting Principle
14 pages
FM Transmitter
No ratings yet
FM Transmitter
12 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
STEEL Standard Specifications
100% (1)
STEEL Standard Specifications
4 pages
Nummerical and Simulation Methods For Calculation of Dynamical Transient Characteristics of Squirrel Cage Induction Motor
No ratings yet
Nummerical and Simulation Methods For Calculation of Dynamical Transient Characteristics of Squirrel Cage Induction Motor
4 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
2-3btc of Freebitco - in
100% (1)
2-3btc of Freebitco - in
2 pages
CA LISA Virtualization - Presentation
No ratings yet
CA LISA Virtualization - Presentation
15 pages
Deye Hybrid 5K y 6K
No ratings yet
Deye Hybrid 5K y 6K
2 pages
Combining Data in Pandas With Merge, .Join, and Concat - Real Python
No ratings yet
Combining Data in Pandas With Merge, .Join, and Concat - Real Python
2 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Unit 4 DSE

Uploaded by

Unit 4 DSE

Uploaded by

HIERARCHICAL INDEXING

We begin with the standard imports:

A Multiply Indexed Series

populations = [33871648, 37253956, 18976457, 19378102, 20851820, 25145561]

pop = pd.Series(populations, index=index)

Out[2]: (California, 2000) 33871648

(California, 2010) 37253956

(New York, 2000) 18976457

(New York, 2010) 19378102

(Texas, 2000) 20851820

Methods of MultiIndex Creation

In[12]: df = pd.DataFrame(np.random.rand(4, 2),

index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],

In[13]: data = {('California', 2000): 33871648,

('California', 2010): 37253956,

('Texas', 2000): 20851820,

('Texas', 2010): 25145561,

('New York', 2000): 18976457,

('New York', 2010): 19378102}

Out[13]: California 2000 33871648

New York 2000 18976457

Texas 2000 20851820

Concatenation of NumPy Arrays

Out[4]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In[5]: x= [[1, 2],

np.concatenate([x, x], axis=1)

Out[5]: array([[1, 2, 1, 2],

Simple Concatenation with pd.concat

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,

keys=None, levels=None, names=None, verify_integrity=False, copy=True)

pd.concat() can be used for a simple concatenation of Series or DataFrame objects,

In[6]: ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3])

ser2 = pd.Series(['D', 'E', 'F'], index=[4, 5, 6])

It also works to concatenate higher-dimensional objects, such as DataFrames:

In[7]: df1 = make_df('AB', [1, 2])

df2 = make_df('AB', [3, 4])

print(df1); print(df2); print(pd.concat([df1, df2]))

df1 df2 pd.concat([df1, df2])

In[8]: df3 = make_df('AB', [0, 1])

df4 = make_df('CD', [0, 1])

df3 df4 pd.concat([df3, df4], axis='col')

# Concatenating along rows (axis=0) or columns (axis=1)

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# Concatenate along rows

# Appending df2 to df1

 ignore_index: If True, it reindexes the resulting DataFrame.

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# Append df2 to df1

# Merging on a specific column

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})

# Merge based on the 'ID' column, with an inner join

# Joining two DataFrames on their indexes

df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie']}, index=[1, 2, 3])

# Join based on the index with an outer join

# calling the pandas read_csv() function.

# and storing the result in DataFrame df

Hierarchical Indexing in pandas:

# using the pandas set_index() function.

df_ind3 = df.set_index(['region', 'state', 'individuals'])

# we can sort the data by using sort_index()

Selecting Data in a Hierarchical Index or using the Hierarchical Indexing:

# region from the dataframe.

# selecting data using level(0) index or main index.

df_ind3_region = df_ind3.loc[['Pacific', 'Mountain']]

You might also like