0% found this document useful (0 votes)

8 views23 pages

DSP Unit-5 Updated

Unit 5 focuses on data wrangling techniques using pandas, including hierarchical indexing, reshaping data, and combining datasets. It covers operations such as merging, joining, and concatenating data, along with examples of different types of joins like inner, outer, left, and right joins. The unit provides practical code snippets to illustrate how to manipulate and analyze data effectively.

Uploaded by

Vineela

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views23 pages

DSP Unit-5 Updated

Uploaded by

Vineela

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

UNIT - 5

Data Wrangling: Join, Combine, and Reshape

In many applications, data may be spread across a number of files or databases or

be arranged in a form that is not easy to analyze. This unit focuses on tools to help
combine, join, and rearrange data.

Hierarchical Indexing

Hierarchical indexing is an important feature of pandas that enables you to have

multiple (two or more) index levels on an axis. Somewhat abstractly, it provides a way
for you to work with higher dimensional data in a lower dimensional form. Let’s start
with a simple example; create a Series with a list of lists (or arrays) as the index:

import pandas as pd
import numpy as np
df = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 3,
1, 2, 2, 3]])
print(df)

Output:
a 1 0.510547
2 0.099520
3 -0.511527
b 1 -0.393166
3 0.807105
c 1 -1.297928
2 0.658603
d 2 1.371548
3 -1.245286
dtype: float64

print(df.index)

MultiIndex([('a', 1),
('a', 2),
('a', 3),
('b', 1),
('b', 3),
('c', 1),
('c', 2),
('d', 2),
('d', 3)],
)

Sri Ch, Chandra Sekhar, IT - AITAM Page 1

UNIT - 5
print(df['b'])

1 0.660261
3 1.552526
dtype: float64

print(df['b':'c'])

b 1 0.010531
3 -0.976936
c 1 -0.317225
2 1.423272
dtype: float64

print(df.loc[['b','d']])

b 1 0.277812
3 1.540045
d 2 1.583744
3 1.448799
dtype: float64

print(df.loc[:, 2])

a -0.329168
c -0.015847
d -0.115682
dtype: float64

Hierarchical indexing plays an important role in reshaping data and group-based

operations like forming a pivot table. For example, you could rearrange the data into
a DataFrame using its unstack method:

df.unstack()

1 2 3
a -0.832795 -1.697236 0.321600
b 0.060409 NaN 1.461795
c -0.743408 -0.573688 NaN
d NaN -1.355128 -0.519725

The inverse operation of unstack is stack:

df.unstack().stack()

Sri Ch, Chandra Sekhar, IT - AITAM Page 2

UNIT - 5

a 1 0.510547
2 0.099520
3 -0.511527
b 1 -0.393166
3 0.807105
c 1 -1.297928
2 0.658603
d 2 1.371548
3 -1.245286
dtype: float64

With a DataFrame, either axis can have a hierarchical index:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(12).reshape((4, 3)),
index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
columns=[['Mango', 'Apple', 'Mango'], ['Green', 'Red', 'Green']])
print(df)

Mango Apple Mango

Green Red Green
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11

The hierarchical levels can have names (as strings or any Python objects). If so, these will
show up in the console output:

df.index.names = ['key1', 'key2']

df.columns.names = ['state', 'color']
print(df)

state Mango Apple Mango

color Green Red Green
key1 key2
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11

Sri Ch, Chandra Sekhar, IT - AITAM Page 3

UNIT - 5
With partial column indexing you can similarly select groups of columns:

print(df['Mango'])

color Green Green

key1 key2
a 1 0 2
2 3 5
b 1 6 8
2 9 11

Reordering and Sorting Levels

At times you will need to rearrange the order of the levels on an axis or sort the data by
the values in one specific level. The swaplevel takes two level numbers or names and
returns a new object with the levels interchanged (but the data is otherwise unaltered):

Print(df.swaplevel('key1', 'key2'))

state Mango Apple Mango

color Green Red Green
key2 key1
1 a 0 1 2
2 a 3 4 5
1 b 6 7 8
2 b 9 10 11

sort_index, on the other hand, sorts the data using only the values in a single level.

print(df.sort_index(level=1))

state Mango Apple Mango

color Green Red Green
key1 key2
a 1 0 1 2
b 1 6 7 8
a 2 3 4 5
b 2 9 10 11

Indexing with a DataFrame’s columns

It’s not unusual to want to use one or more columns from a DataFrame as the row index;
alternatively, you may wish to move the row index into the DataFrame’s columns. Here’s
an example DataFrame:

Sri Ch, Chandra Sekhar, IT - AITAM Page 4

UNIT - 5
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': range(7), 'b': range(7, 0, -1),
'c': ['one', 'one', 'one', 'two', 'two', 'two', 'two'],
'd': [0, 1, 2, 0, 1, 2, 3]})
print(df)

a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 two 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3

DataFrame’s set_index function will create a new DataFrame using one or more of its
columns as the index:

df1 = df.set_index(['c', 'd'])

print(df1)

c d
one 0 0 7
1 1 6
2 2 5
two 0 3 4
1 4 3
2 5 2
3 6 1
By default the columns are removed from the DataFrame, though you can leave them in:

df.set_index(['c', 'd'], drop=False)

print(df)

a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 two 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3

Sri Ch, Chandra Sekhar, IT - AITAM Page 5

UNIT - 5

reset_index, on the other hand, does the opposite of set_index; the hierarchical index
levels are moved into the columns:

print(df1.reset_index())

c d a b
0 one 0 0 7
1 one 1 1 6
2 one 2 2 5
3 two 0 3 4
4 two 1 4 3
5 two 2 5 2
6 two 3 6 1

Combining and Merging Datasets

Data contained in pandas objects can be combined together in a number of ways:

 pandas.merge connects rows in DataFrames based on one or more keys. This will
be familiar to users of SQL or other relational databases, as it implements database
join operations.
 pandas.concat concatenates or “stacks” together objects along an axis.
 The combine_first instance method enables splicing together overlapping data to
fill in missing values in one object with values from another.

Join Operation

 The join() function used to join two or more pandas DataFrames/Series

horizontally.
 Join() uses merge internally for the index-on-index (by default) and column(s)-on-
index join.
 Aligns the calling DataFrame’s column(s) or index with the other objects’ index
(and not the columns).
 Defaults to left join with options for right, inner and outer join

There are five types of Joins in Pandas.

 Inner Join
 Left Outer Join
 Right Outer Join
 Full Outer Join or simply Outer Join

Sri Ch, Chandra Sekhar, IT - AITAM Page 6

UNIT - 5
import pandas as pd

df1 = pd.DataFrame({"A": ["A0", "A1", "A2"], "B": ["B0", "B1", "B2"]},

index=["K0", "K1", "K2"])

df2 = pd.DataFrame({"C": ["C0", "C2", "C3"], "D": ["D0", "D2", "D3"]},

index=["K0", "K2", "K3"])

print(df1)
print(df2)

df3 = df1.join(df2)
print(df3)

A B
K0 A0 B0
K1 A1 B1
K2 A2 B2

C D
K0 C0 D0
K2 C2 D2
K3 C3 D3

A B C D
K0 A0 B0 C0 D0
K1 A1 B1 NaN NaN
K2 A2 B2 C2 D2

Inner Join
Inner join is the most common type of join you’ll be working
with. It returns a Dataframe with only those rows that have
common characteristics. This is similar to the intersection of
two sets.

df4 = df1.join(df2, how='inner')

print(df4)

A B C D
K0 A0 B0 C0 D0
K2 A2 B2 C2 D2

Sri Ch, Chandra Sekhar, IT - AITAM Page 7

UNIT - 5
Full Outer Join
A full outer join returns all the rows from the left Dataframe,
and all the rows from the right Dataframe, and matches up
rows where possible, with NaNs elsewhere. But if the
Dataframe is complete, then we get the same output.

df5 = df1.join(df2, how='outer')

print(df5)

A B C D
K0 A0 B0 C0 D0
K1 A1 B1 NaN NaN
K2 A2 B2 C2 D2
K3 NaN NaN C3 D3

Left Outer Join

With a left outer join, all the records from the first
Dataframe will be displayed, irrespective of whether the
keys in the first Dataframe can be found in the second
Dataframe. Whereas, for the second Dataframe, only the
records with the keys in the second Dataframe that can be
found in the first Dataframe will be displayed.

df6 = df1.join(df2, how='left')

print(df6)

A B C D
K0 A0 B0 C0 D0
K1 A1 B1 NaN NaN
K2 A2 B2 C2 D2

Right Outer Join

For a right join, all the records from the second Dataframe
will be displayed. However, only the records with the keys
in the first Dataframe that can be found in the second
Dataframe will be displayed.

df7 = df1.join(df2, how='right')

print(df7)

A B C D
K0 A0 B0 C0 D0
K2 A2 B2 C2 D2
K3 NaN NaN C3 D3
Sri Ch, Chandra Sekhar, IT - AITAM Page 8
UNIT - 5
Merging Operation:

 The merge() function used to merge the DataFrames with database-style join such
as inner join, outer join, left join, right join.
 Combining exactly two DataFrames.
 The join is done on columns or indexes.
 If joining columns on columns, the DataFrame indexes will be ignored.
 If joining indexes on indexes or indexes on a column, the index will be passed on.

import pandas as pd

df1 = pd.DataFrame({
'id':[1,2,3,4], 'sub_id':['s1','s2','s4','s6'], 'marks': [55, 77, 88, 66]})

df2 = pd.DataFrame({
'id':[1,2,3,4], 'sub_id':['s2','s4','s3','s6'], 'marks': [60, 40, 50, 70]})
print (df1)
print (df2)

id sub_id marks
0 1 s1 55
1 2 s2 77
2 3 s4 88
3 4 s6 66
id sub_id marks
0 1 s2 60
1 2 s4 40
2 3 s3 50
3 4 s6 70

 on:- This specifies the column or index on which the merge is supposed to happen. If
the value for on is None, the dataframe will be merged based on columns in both
available dataframes.

df3 = pd.merge(df1, df2, on='id')

print(df3)

id sub_id_x marks_x sub_id_y marks_y

0 1 s1 55 s2 60
1 2 s2 77 s4 40
2 3 s4 88 s3 50
3 4 s6 66 s6 70

Sri Ch, Chandra Sekhar, IT - AITAM Page 9

UNIT - 5

 left_on:- When this parameter is selected columns or indexes are merged in the
first dataframe.
 right_on:-When this parameter is selected columns or indexes are merged in the
second dataframe.
 If the column names are different in each object, you can specify them separately:

import pandas as pd

df1 = pd.DataFrame({ 'id1':[1,2,3,4], 'sub_id':['s1','s2','s4','s6'], 'marks': [55, 77, 88, 66]})

df2 = pd.DataFrame({ 'id2':[1,2,3,4], 'sub_id':['s2','s4','s3','s6'], 'marks': [60, 40, 50, 70]})
df3 = pd.merge(df1, df2, left_on='id1', right_on='id2')
print(df3)

id1 sub_id_x marks_x id2 sub_id_y marks_y

0 1 s1 55 1 s2 60
1 2 s2 77 2 s4 40
2 3 s4 88 3 s3 50
3 4 s6 66 4 s6 70

 how: {'left', 'right', 'outer', 'inner'}, default 'inner'

Type of merge to be performed.

 left: It use only keys from the left frame, similar to left outer join
 right: It use only keys from the right frame, similar to right outer join
 outer: It used the union of keys from both frames, similar to a full outer join.
 inner: It use the intersection of keys from both frames, similar to a inner join

import pandas as pd

df1 = pd.DataFrame( 'id':[1,2,3,5], 'sub_id':['s1','s2','s4','s6'], 'marks': [55, 77, 88, 66]})

df2 = pd.DataFrame({'id':[1,2,3,6], 'sub_id':['s2','s4','s3','s6'], 'marks': [60, 40, 50, 70]})

df3 = pd.merge(df1, df2, on ='id', how = 'inner')

print(df3)

id sub_id_x marks_x sub_id_y marks_y

0 1 s1 55 s2 60
1 2 s2 77 s4 40
2 3 s4 88 s3 50

Sri Ch, Chandra Sekhar, IT - AITAM Page 10

UNIT - 5
df4 = pd.merge(df1, df2, on ='id', how = 'left')
print(df4)

id sub_id_x marks_x sub_id_y marks_y

0 1 s1 55 s2 60.0
1 2 s2 77 s4 40.0
2 3 s4 88 s3 50.0
3 5 s6 66 NaN NaN

df5 = pd.merge(df1, df2, on ='id', how = 'right')

print(df5)

id sub_id_x marks_x sub_id_y marks_y

0 1 s1 55.0 s2 60
1 2 s2 77.0 s4 40
2 3 s4 88.0 s3 50
3 6 NaN NaN s6 70

df6 = pd.merge(df1, df2, on ='id', how = 'outer')

print(df6)

id sub_id_x marks_x sub_id_y marks_y

0 1 s1 55.0 s2 60.0
1 2 s2 77.0 s4 40.0
2 3 s4 88.0 s3 50.0
3 5 s6 66.0 NaN NaN
4 6 NaN NaN s6 70.0

 Suffixes:- It is the sequence of length two. The values are of string datatype and
indicate the suffix to be added to the overlapping column names on the first and
second respectively after the dataframes are merged. Its default value is (“_x”, “_y”).

import pandas as pd

df1 = pd.DataFrame({'key1': ['f1', 'f1', 'b1'], 'key2': ['one', 'two', 'one'], 'lval': [1, 2, 3]})
df2 = pd.DataFrame({'key1': ['f1', 'f1', 'b1', 'b1'], 'key2': ['one', 'one', 'one', 'two'], 'rval':
[4, 5, 6, 7]})

df3 = pd.merge(df1, df2, on ='key1', suffixes=('_left', '_right'))

print(df3)

Sri Ch, Chandra Sekhar, IT - AITAM Page 11

UNIT - 5
key1 key2_left lval key2_right rval
0 f1 one 1 one 4
1 f1 one 1 one 5
2 f1 two 2 one 4
3 f1 two 2 one 5
4 b1 one 3 one 6
5 b1 one 3 two 7

Concatenate Operation:

 Concatenate two or more pandas DataFrames/Series vertically or horizontally.

 Aligns only on the index by specifying the axis parameter.
 Defaults to outer join with the option for inner join

import pandas as pd
import numpy as np
a = np.arange(12).reshape((3, 4))
print(a)

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

Sri Ch, Chandra Sekhar, IT - AITAM Page 12

UNIT - 5

a1 = np.concatenate([a, a], axis=1)

print(a1)

[[ 0 1 2 3 0 1 2 3]
[ 4 5 6 7 4 5 6 7]
[ 8 9 10 11 8 9 10 11]]

In the context of pandas objects such as Series and DataFrame, having labeled axes
enable you to further generalize array concatenation. In particular, you have a number
of additional things to think about:

 If the objects are indexed differently on the other axes, should we combine the
distinct elements in these axes or use only the shared values (the intersection)?
 Do the concatenated chunks of data need to be identifiable in the resulting object?
 Does the “concatenation axis” contain data that needs to be preserved? In many
cases, the default integer labels in a DataFrame are best discarded during
concatenation.

The concat() function in pandas provides a consistent way to address each of these
concerns.

import pandas as pd
#import numpy as np

s1 = pd.Series([0, 1], index=['a', 'b'])

s2 = pd.Series([2, 3, 4], index=['c', 'd', 'e'])
s3 = pd.Series([5, 6], index=['f', 'g'])

s = pd.concat([s1, s2, s3])

print(s)

a 0
b 1
c 2
d 3
e 4
f 5
g 6
dtype: int64

By default concat() works along axis=0, producing another Series. If you pass axis=1,
the result will instead be a DataFrame (axis=1 is the columns):

Sri Ch, Chandra Sekhar, IT - AITAM Page 13

UNIT - 5
s = pd.concat([s1, s2, s3], axis=1)
print(s)
0 1 2
a 0.0 NaN NaN
b 1.0 NaN NaN
c NaN 2.0 NaN
d NaN 3.0 NaN
e NaN 4.0 NaN
f NaN NaN 5.0
g NaN NaN 6.0

In this case there is no overlap on the other axis

s4 = pd.concat([s1, s3])
s = pd.concat([s1, s4], axis=1)
print(s)
0 1
a 0.0 0
b 1.0 1
f NaN 5
g NaN 6

s5 = pd.concat([s1, s4], axis=1, join='inner')

print(s5)

0 1
a 0 0
b 1 1

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'], columns=['one',

'two'])
df2 = pd.DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'], columns=['three',
'four'])
df3 = pd.concat([df1, df2], axis=1)
print(df3)

one two three four

a 0 1 5.0 6.0
b 2 3 NaN NaN
c 4 5 7.0 8.0

Sri Ch, Chandra Sekhar, IT - AITAM Page 14

UNIT - 5

df5 = pd.concat([df1, df2], ignore_index=True)

print(df5)

one two three four

0 0.0 1.0 NaN NaN
1 2.0 3.0 NaN NaN
2 4.0 5.0 NaN NaN
3 NaN NaN 5.0 6.0
4 NaN NaN 7.0 8.0

Combine Operation (Combining Data with Overlap)

There is another data combination situation that can’t be expressed as either a merge or
concatenation operation. You may have two datasets whose indexes overlap in full or
part. As a motivating example, consider NumPy’s where function, which performs the
array-oriented equivalent of an if-else expression:

import pandas as pd
import numpy as np

a = pd.Series([np.nan, 2.5, np.nan, 3.5, 4.5, np.nan], index=['f', 'e', 'd', 'c', 'b', 'a'])
b = pd.Series(np.arange(len(a), dtype=np.float64), index=['f', 'e', 'd', 'c', 'b', 'a'])
print(a)
print(b)

f NaN
e 2.5
d NaN
c 3.5
b 4.5
a NaN
dtype: float64
f 0.0
e 1.0
d 2.0
c 3.0
b 4.0
a 5.0
dtype: float64

c = b.combine_first(a)
print(c)
Sri Ch, Chandra Sekhar, IT - AITAM Page 15
UNIT - 5

f 0.0
e 1.0
d 2.0
c 3.0
b 4.0
a 5.0
dtype: float64

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'a': [1., np.nan, 5., np.nan], 'b': [np.nan, 2., np.nan, 6.], 'c': range(2,
18, 4)})
df2 = pd.DataFrame({'a': [5., 4., np.nan, 3., 7.], 'b': [np.nan, 3., 4., 6., 8.]})

print(df1)
print(df2)

c = df1.combine_first(df2)
print(c)

a b c
0 1.0 NaN 2
1 NaN 2.0 6
2 5.0 NaN 10
3 NaN 6.0 14

a b
0 5.0 NaN
1 4.0 3.0
2 NaN 4.0
3 3.0 6.0
4 7.0 8.0

a b c
0 1.0 NaN 2.0
1 4.0 2.0 6.0
2 5.0 4.0 10.0
3 3.0 6.0 14.0
4 7.0 8.0 NaN

Sri Ch, Chandra Sekhar, IT - AITAM Page 16

UNIT - 5

Reshaping with hierarchical indexing

Hierarchical indexing provides a consistent way to rearrange data in a DataFrame.

There are two primary actions:
Stack: This “rotates” or pivots from the columns in the data to the rows
Unstuck: This pivots from the rows into the columns

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(6).reshape((2, 3)), index=pd.Index(['AP', 'UP'],

name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))
print(df)

number one two three

state
AP 0 1 2
UP 3 4 5

Using the stack method on this data pivots the columns into the rows, producing a
Series:
r = df.stack()
print(r)

state number
AP one 0
two 1
three 2
UP one 3
two 4
three 5
dtype: int64

From a hierarchically indexed Series, you can rearrange the data back into a Data‐
Frame with unstack:

b = r.unstack()
print(b)

number one two three

state
AP 0 1 2
UP 3 4 5

Sri Ch, Chandra Sekhar, IT - AITAM Page 17

UNIT - 5
By default the innermost level is unstacked (same with stack). You can unstack a
different level by passing a level number or name:

s = r.unstack(0)
print(s)

state AP UP
number
one 0 3
two 1 4
three 2 5

Unstacking might introduce missing data if all of the values in the level aren’t found in
each of the subgroups:

import pandas as pd

s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])

s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
df = pd.concat([s1, s2], keys=['one', 'two'])
print(df)

one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64

print(df.unstack())

a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0

Pivoting

Pivoting “Long” to “Wide” Format

A common way to store multiple time series in databases and CSV is in so-called long or
stacked format. Let’s load some example data and do a small amount of time series
wrangling and other data cleaning:

Sri Ch, Chandra Sekhar, IT - AITAM Page 18

UNIT - 5

data = pd.read_csv('examples/macrodata.csv')
print(data.head())

periods = pd.PeriodIndex(year=data.year, quarter=data.quarter,name='date')

columns = pd.Index(['realgdp', 'infl', 'unemp'], name='item')
data = data.reindex(columns=columns)
data.index = periods.to_timestamp('D', 'end')
ldata = data.stack().reset_index().rename(columns={0: 'value'})

print(ldata[:10])

This is the so-called long format for multiple time series, or other observational data with
two or more keys (here, our keys are date and item). Each row in the table represents a
single observation.

In some cases, the data may be more difficult to work with in this format; you might
prefer to have a DataFrame containing one column per distinct item value indexed by
timestamps in the date column. Data‐Frame’s pivot method performs exactly this
transformation:
Sri Ch, Chandra Sekhar, IT - AITAM Page 19
UNIT - 5

pivoted = ldata.pivot('date', 'item', 'value')

print(pivoted)

The first two values passed are the columns to be used respectively as the row and
column index, then finally an optional value column to fill the DataFrame. Suppose you
had two value columns that you wanted to reshape simultaneously:

ldata['value2'] = np.random.randn(len(ldata))
print(ldata[:10])

By omitting the last argument, you obtain a DataFrame with hierarchical columns:

Sri Ch, Chandra Sekhar, IT - AITAM Page 20

UNIT - 5

pivoted = ldata.pivot('date', 'item')

print(pivoted[:5])

print(pivoted['value'][:5])

Note that pivot is equivalent to creating a hierarchical index using set_index followed
by a call to unstack:

unstacked = ldata.set_index(['date', 'item']).unstack('item')

print(unstacked[:7])

Pivoting “Wide” to “Long” Format

An inverse operation to pivot for DataFrames is pandas.melt. Rather than transforming

one column into many in a new DataFrame, it merges multiple columns into one,
producing a DataFrame that is longer than the input. Let’s look at an example:

Sri Ch, Chandra Sekhar, IT - AITAM Page 21

UNIT - 5
import pandas as pd

df = pd.DataFrame({'key': ['foo', 'bar', 'baz'], 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
print(df)

key A B C
0 foo 1 4 7
1 bar 2 5 8
2 baz 3 6 9

The 'key' column may be a group indicator, and the other columns are data values.
When using pandas.melt, we must indicate which columns (if any) are group indicators.
Let’s use 'key' as the only group indicator here:

melted = pd.melt(df, ['key'])

print(melted)

key variable value

0 foo A 1
1 bar A 2
2 baz A 3
3 foo B 4
4 bar B 5
5 baz B 6
6 foo C 7
7 bar C 8
8 baz C 9

Using pivot, we can reshape back to the original layout:

reshaped = melted.pivot('key', 'variable', 'value')

print(reshaped)

variable A B C
key
bar 2 5 8
baz 3 6 9
foo 1 4 7

Since the result of pivot creates an index from the column used as the row labels, we may
want to use reset_index to move the data back into a column:

print(reshaped.reset_index())

Sri Ch, Chandra Sekhar, IT - AITAM Page 22

UNIT - 5
variable key A B C
0 bar 2 5 8
1 baz 3 6 9
2 foo 1 4 7

You can also specify a subset of columns to use as value columns:

print(pd.melt(df, id_vars=['key'], value_vars=['A', 'B']))

key variable value
0 foo A 1
1 bar A 2
2 baz A 3
3 foo B 4
4 bar B 5
5 baz B 6

pandas.melt can be used without any group identifiers, too:

print(pd.melt(df, value_vars=['A', 'B', 'C']))

variable value
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
6 C 7
7 C 8
8 C 9

print(pd.melt(df, value_vars=['key', 'A', 'B']))

variable value
0 key foo
1 key bar
2 key baz
3 A 1
4 A 2
5 A 3
6 B 4
7 B 5
8 B 6

Sri Ch, Chandra Sekhar, IT - AITAM Page 23

Python Unit Iv - Pandas
No ratings yet
Python Unit Iv - Pandas
36 pages
Module - d2
No ratings yet
Module - d2
41 pages
Python Lecture 5 (2025)
No ratings yet
Python Lecture 5 (2025)
29 pages
07 Data Wrangling
No ratings yet
07 Data Wrangling
51 pages
UNIT IV Material
No ratings yet
UNIT IV Material
23 pages
OOM Unit 2
No ratings yet
OOM Unit 2
145 pages
Lecture 8 - Data Wrangling Using Pandas
No ratings yet
Lecture 8 - Data Wrangling Using Pandas
31 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Pandas - Dataframe - Merging or Joining
No ratings yet
Pandas - Dataframe - Merging or Joining
29 pages
Pandas Moderate
No ratings yet
Pandas Moderate
15 pages
Notes For Python Part III
No ratings yet
Notes For Python Part III
44 pages
Data Science Data Manipulation With Pandas
No ratings yet
Data Science Data Manipulation With Pandas
77 pages
4th Unit Answer Bank
No ratings yet
4th Unit Answer Bank
40 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
Edp 3
No ratings yet
Edp 3
16 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Joining Data 4
No ratings yet
Joining Data 4
40 pages
Module 4
No ratings yet
Module 4
38 pages
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Combining Datasets
No ratings yet
Combining Datasets
36 pages
Pandas
No ratings yet
Pandas
44 pages
Exp 6
No ratings yet
Exp 6
9 pages
Panda Joins
No ratings yet
Panda Joins
25 pages
UnitIV 1
No ratings yet
UnitIV 1
4 pages
Data Wrangling and Analysis
100% (1)
Data Wrangling and Analysis
36 pages
Wrangling 1
No ratings yet
Wrangling 1
5 pages
Unit 4 1
No ratings yet
Unit 4 1
3 pages
Praveen PPT
No ratings yet
Praveen PPT
9 pages
Ch8 Data Wrangling Join, Combine, and Reshape
No ratings yet
Ch8 Data Wrangling Join, Combine, and Reshape
13 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Merge, Join, and Concatenate: Concatenating Objects
No ratings yet
Merge, Join, and Concatenate: Concatenating Objects
62 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Pandas
No ratings yet
Pandas
26 pages
Data Miner 2 Api Guide PDF
No ratings yet
Data Miner 2 Api Guide PDF
58 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Pandas
No ratings yet
Pandas
94 pages
Exp 3
No ratings yet
Exp 3
10 pages
Selenium Java PDF
100% (1)
Selenium Java PDF
214 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Pandas
No ratings yet
Pandas
13 pages
Reference Guide - Pandas Tools For Structuring A Dataset
No ratings yet
Reference Guide - Pandas Tools For Structuring A Dataset
5 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
RCSTM8
No ratings yet
RCSTM8
214 pages
Ling
No ratings yet
Ling
57 pages
Unit III
No ratings yet
Unit III
55 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
F
No ratings yet
F
42 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
PostgreSQL CHEAT SHEET
No ratings yet
PostgreSQL CHEAT SHEET
8 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
LM32 Ait L22
No ratings yet
LM32 Ait L22
20 pages
Authentication & Authorization
No ratings yet
Authentication & Authorization
7 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Cluster Computing
No ratings yet
Cluster Computing
57 pages
Ecos
No ratings yet
Ecos
36 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Silabus Sekolah Fullstack
No ratings yet
Silabus Sekolah Fullstack
21 pages
Rendy Khonelius Studi Kasus MYSQL
No ratings yet
Rendy Khonelius Studi Kasus MYSQL
19 pages
Crossword Compiler: A Data Structure, Algorithms, and Entropy
No ratings yet
Crossword Compiler: A Data Structure, Algorithms, and Entropy
24 pages
DSP Assignment I
No ratings yet
DSP Assignment I
1 page
Week3 COMPUTER PROGRAMMING
No ratings yet
Week3 COMPUTER PROGRAMMING
17 pages
JSP Objective
No ratings yet
JSP Objective
16 pages
Mysql Subquery
No ratings yet
Mysql Subquery
19 pages
Artificial Intelligence Based Language Translation
No ratings yet
Artificial Intelligence Based Language Translation
9 pages
Python
No ratings yet
Python
4 pages
M.C.A. (Semester - V) Examination January - 2021 - 504 - NoSQL Databases
No ratings yet
M.C.A. (Semester - V) Examination January - 2021 - 504 - NoSQL Databases
2 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Design and Analysis of Algorithm - Webview
No ratings yet
Design and Analysis of Algorithm - Webview
10 pages
Architecture Java Runtime Environment
No ratings yet
Architecture Java Runtime Environment
12 pages
Heap Sort Min-Heap or Max-Heap
No ratings yet
Heap Sort Min-Heap or Max-Heap
11 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
University Institute of Engineering Department of Electronics & Communication Engineering
No ratings yet
University Institute of Engineering Department of Electronics & Communication Engineering
3 pages
Cap Classification System Web
No ratings yet
Cap Classification System Web
16 pages
What Is A PHP File?
No ratings yet
What Is A PHP File?
12 pages
RAW Paste Data
100% (7)
RAW Paste Data
1 page
Hemavathi S Updated Resume
No ratings yet
Hemavathi S Updated Resume
2 pages
Computer Science QP
No ratings yet
Computer Science QP
4 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
SAP ABAP Online Training
No ratings yet
SAP ABAP Online Training
3 pages
Eclipse Error Log
No ratings yet
Eclipse Error Log
1 page
Python 3 Cheat Sheet v3
100% (5)
Python 3 Cheat Sheet v3
13 pages
Spring End Sem Data Structure Question 2012-13
No ratings yet
Spring End Sem Data Structure Question 2012-13
2 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet