0% found this document useful (0 votes)

17 views10 pages

Exp 3

This document outlines various data transformation techniques using pandas, including merging data frames, reshaping data with hierarchical indexing, handling missing data, and data deduplication. It provides sample experiments to perform these operations, such as merging data frames, filling missing values, and replacing values in data frames. The document also emphasizes the importance and challenges of data transformation in data analysis.

Uploaded by

damisettilohitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views10 pages

Exp 3

Uploaded by

damisettilohitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT-3

(EXPERIMENTS-3)

Data Transformation: Merging database-style data frames, concatenating along

with an axis, merging on index, Reshaping and pivoting, melting Transformation
techniques, handling missing data, Mathematical operations with NaN, Filling
missing values, Discretization and binning, Outlier detection and filtering,
Permutation and random sampling, Benefits of data transformation, Challenges
Sample Experiments:
1. Perform the following operations
a) Merging Dataframes
b) Reshaping with Hierarchical Indexing
c) Data Deduplication
d) Replacing Values
2. Apply different Missing Data handling techniques
a) NaN values in mathematical Operations
b) Filling in missing data
c) Forward and Backward filling of missing values
d) Filling with index values
e) Interpolation of missing values
3. Apply different data transformation techniques
a) Renaming axis indexes
b) Discretization and Binning
c) Permutation and Random Sampling
d) Dummy variables
1. Perform the following operations
a) Merging Data frames
b) Reshaping with Hierarchical Indexing
c) Data Deduplication
d) Replacing Values

a) Merging Dataframes
Creating First Dataframe to Perform Merge Operation
# import module
import pandas as pd
# creating DataFrame for Student Details
details = pd.DataFrame({
'ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
'NAME': ['Jagroop', 'Praveen', 'Harjot', 'Pooja', 'Rahul',
'Nikita', 'Saurabh', 'Ayush', 'Dolly', "Mohit"],
'BRANCH': ['CSE', 'CSE', 'CSE', 'CSE', 'CSE', 'CSE', 'CSE',
'CSE', 'CSE', 'CSE']})
# printing details
print(details)

Creating Second Dataframe to Perform Merge operation

# Import module
import pandas as pd
# Creating Dataframe for Fees_Status
fees_status = pd.DataFrame(
{'ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
'PENDING': ['5000', '250', 'NIL', '9000', '15000', 'NIL','4500', '1800',
'250', 'NIL']})
# Printing fees_status
print(fees_status)

Merge Operation
# Import module
import pandas as pd
# Creating Dataframe
details = pd.DataFrame({
'ID': [101, 102, 103, 104, 105,
106, 107, 108, 109, 110],
'NAME': ['Jagroop', 'Praveen', 'Harjot',
'Pooja', 'Rahul', 'Nikita',
'Saurabh', 'Ayush', 'Dolly', "Mohit"],
'BRANCH': ['CSE', 'CSE', 'CSE', 'CSE', 'CSE',
'CSE', 'CSE', 'CSE', 'CSE', 'CSE']})
# Creating Dataframe
fees_status = pd.DataFrame(
{'ID': [101, 102, 103, 104, 105,
106, 107, 108, 109, 110],
'PENDING': ['5000', '250', 'NIL',
'9000', '15000', 'NIL',
'4500', '1800', '250', 'NIL']})
# Merging Dataframe

print(pd.merge(details, fees_status, on='ID'))

OUTPUT:

Two Data frame For Concatenation:

# importing pandas module
import pandas as pd
# Define a dictionary containing employee data
data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd'],
'Mobile No': [97, 91, 58, 76]}
# Define a dictionary containing employee data
data2 = {'Name':['Gaurav', 'Anuj', 'Dhiraj', 'Hitesh'],
'Age':[22, 32, 12, 52],
'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'],
'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons'],
'Salary':[1000, 2000, 3000, 4000]}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
# Convert the dictionary into DataFrame
df1 = pd.DataFrame(data2, index=[2, 3, 6, 7])
res = pd.concat([df, df1])
b) Reshaping with Hierarchical Indexing
Hierarchical indexing is an important feature of pandas that enables you to have
multiple (two or more) index levels on an axis. It provides a way for you to work with
higher dimensional data in a lower dimensional form
import numpy as np
import pandas as pd
data = pd.Series(np.random.randn(9),
index = [['a','a','a','b','b','c','c','d','d'],
[1,2,3,1,3,1,2,2,3]])
data
a 1 -0.214941
2 2.147522
3 0.564280
b1 1.059833
3 -1.104780
c1 0.210634
2 1.423999
d 2 -1.256163
3 -1.129026
dtype: float64
data.shape
(9,)
ata.ndim
1
data.index

MultiIndex([('a', 1),

('a', 2),

('a', 3),

('b', 1),

('b', 3),

('c', 1),

('c', 2),

('d', 2),

('d', 3)],

data['b']
1 1.059833
3 -1.104780
dtype: float64
data['b':'c']

b 1 1.059833

3 -1.104780

c 1 0.210634

2 1.423999

dtype: float64

data.loc[['b','d']]

b 1 1.059833

3 -1.104780

d 2 -1.256163

3 -1.129026

dtype: float64

data.loc[:,2] # inner level selection a2, c2, d2

a 2.147522
c 1.423999
d -1.256163
dtype: float64
Hierarchical indexing plays an important role in reshaping data and group-based operations
like forming a pivot table
data.unstack()
1 2 3
a -0.214941 2.147522 0.564280
b 1.059833 NaN -1.104780
c 0.210634 1.423999 NaN
d NaN -1.256163 -1.129026
data.unstack().stack()
a 1 -0.214941
2 2.147522
3 0.564280
b 1 1.059833
3 -1.104780
c 1 0.210634
2 1.423999
d 2 -1.256163
3 -1.129026
dtype: float64

c) Data Deduplication
Duplicate data from the Dataset
# import module
import pandas as pd
# initializing Data
student_data = {'Name': ['Amit', 'Praveen', 'Jagroop',
'Rahul', 'Vishal', 'Suraj',
'Rishab', 'Satyapal', 'Amit',
'Rahul', 'Praveen', 'Amit'],
'Roll_no': [23, 54, 29, 36, 59, 38,
12, 45, 34, 36, 54, 23],
'Email': ['[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]',
'[email protected]']}
# creating dataframe
df = pd.DataFrame(student_data)
# Here df.duplicated() list duplicate Entries in ROllno.
# So that ~(NOT) is placed in order to get non duplicate values.
non_duplicate = df[~df.duplicated('Roll_no')]
# printing non-duplicate values
print(non_duplicate)

OUTPUT:

D) Replacing Values
import pandas as pd
df = { "Array_1": [49.50, 70], "Array_2": [65.1, 49.50]}
data = pd.DataFrame(df)print(data.replace(49.50, 60))
You can replace specific values in a Data Frame using the replace () method. Here’s a basic
example:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
# Replace all occurrences of 1 with 100
df.replace(1, 100, inplace=True)
print(df)
Replace Values in Pandas Dataframe
# importing pandas as pd
import pandas as pd
# Making data frame from the csv file
df = pd.read_csv("nba.csv")
# Printing the first 10 rows of the data frame for visualization
df[:10]

Replacing a Single Value

We are going to replace team “Boston Celtics” with “Omega Warrior” in the ‘df’
Dataframe
# this will replace "Boston Celtics" with "Omega Warrior"
df.replace(to_replace="Boston Celtics", value="Omega Warrior")
Replacing Two Values with a Single Value
Replacing more than one value at a time. Using python list as an argument We are
going to replace team “Boston Celtics” and “Texas” with “Omega Warrior” in the ‘df’
Data frame
# importing pandas as pd
import pandas as pd
# Making data frame from the csv file
df = pd.read_csv("nba.csv")
# this will replace "Boston Celtics" and "Texas" with "Omega Warrior"
df.replace(to_replace=["Boston Celtics", "Texas"],
value="Omega Warrior")

DAP Module4 Notes
No ratings yet
DAP Module4 Notes
17 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
STA 324 Survey $ Sampling
100% (1)
STA 324 Survey $ Sampling
98 pages
Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
DSBDAL
No ratings yet
DSBDAL
87 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Lab File
No ratings yet
Lab File
96 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
Python Cheat Sheets
97% (33)
Python Cheat Sheets
11 pages
Data Wrangling - Jupyter Notebook
No ratings yet
Data Wrangling - Jupyter Notebook
5 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Pandas
No ratings yet
Pandas
44 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Edp 3
No ratings yet
Edp 3
16 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Econ 316 Course Outline
No ratings yet
Econ 316 Course Outline
4 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Pandas
No ratings yet
Pandas
94 pages
PYTHON PROGRAMMING: Data Handling
No ratings yet
PYTHON PROGRAMMING: Data Handling
12 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Thesis Presentation and Analysis of Data
100% (1)
Thesis Presentation and Analysis of Data
6 pages
Exp 6
No ratings yet
Exp 6
9 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Week 5 LAB
No ratings yet
Week 5 LAB
23 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Data Wrangling
No ratings yet
Data Wrangling
5 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas Part-2
No ratings yet
Pandas Part-2
9 pages
Test Bank For Elementary Statistics: A Step by Step Approach 10th Edition Allan Bluman Download
100% (2)
Test Bank For Elementary Statistics: A Step by Step Approach 10th Edition Allan Bluman Download
53 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
DMT Function
No ratings yet
DMT Function
10 pages
10) Merging Dataframes: # Detecting Duplicates
No ratings yet
10) Merging Dataframes: # Detecting Duplicates
7 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Dataframe
No ratings yet
Dataframe
19 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
GRand FiNAlle
No ratings yet
GRand FiNAlle
48 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
No ratings yet
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
14 pages
Bayesian Methods in Finance: Eric Jacquier and Nicholas Polson
No ratings yet
Bayesian Methods in Finance: Eric Jacquier and Nicholas Polson
92 pages
01 - BIOE 211 - Nature of Statistics and Data Processing
No ratings yet
01 - BIOE 211 - Nature of Statistics and Data Processing
26 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Workshop On SEM Intro NRMS
No ratings yet
Workshop On SEM Intro NRMS
29 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Elementary Statisctics Reviewer
No ratings yet
Elementary Statisctics Reviewer
5 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Exercise Problems: Information Theory and Coding
No ratings yet
Exercise Problems: Information Theory and Coding
6 pages
Fiverr Gig Research
No ratings yet
Fiverr Gig Research
7 pages
Smaw NC Iv
No ratings yet
Smaw NC Iv
70 pages
Process Capability - A Managers Tool For 6 Sigma Quality Advantage
No ratings yet
Process Capability - A Managers Tool For 6 Sigma Quality Advantage
9 pages
Heuristic Method
No ratings yet
Heuristic Method
9 pages
Birds Incomplete Counts Line Transect Counts
No ratings yet
Birds Incomplete Counts Line Transect Counts
14 pages
Briefreport: The Role of Social Media For Blood Donor Motivation and Recruitment
No ratings yet
Briefreport: The Role of Social Media For Blood Donor Motivation and Recruitment
3 pages
Activity 7 (Mean Deviation, Standard Deviation, and Variance)
No ratings yet
Activity 7 (Mean Deviation, Standard Deviation, and Variance)
6 pages
Case Study Report Format: Cover Page Introduction
No ratings yet
Case Study Report Format: Cover Page Introduction
9 pages
Assignment: MBA 1 Semester Statistics For Management (Mb0040)
No ratings yet
Assignment: MBA 1 Semester Statistics For Management (Mb0040)
10 pages
Module 1: Introduction To Statistics: Learning Outcomes
No ratings yet
Module 1: Introduction To Statistics: Learning Outcomes
6 pages
Gan Final
No ratings yet
Gan Final
2 pages
Applications of Mathematics in Science: Abstarct
No ratings yet
Applications of Mathematics in Science: Abstarct
3 pages
Energy Literacy Evaluating Knowledge, Affect, and Behavior of Students
No ratings yet
Energy Literacy Evaluating Knowledge, Affect, and Behavior of Students
9 pages
Decision Analysis Project
No ratings yet
Decision Analysis Project
10 pages
Gsdeemer Third Edition: Alberto Ades, Rumi Masih, Daniel Tenengauzer September 1999
No ratings yet
Gsdeemer Third Edition: Alberto Ades, Rumi Masih, Daniel Tenengauzer September 1999
16 pages
10 1 1 1013 689 PDF
No ratings yet
10 1 1 1013 689 PDF
12 pages
Stat 330 Solution To Homework 4 1 Probability Mass Function
No ratings yet
Stat 330 Solution To Homework 4 1 Probability Mass Function
3 pages
SPC 2nd Edition PDF
100% (5)
SPC 2nd Edition PDF
235 pages

Exp 3

Uploaded by

Exp 3

Uploaded by

UNIT-3

Data Transformation: Merging database-style data frames, concatenating along

Creating Second Dataframe to Perform Merge operation

print(pd.merge(details, fees_status, on='ID'))

Two Data frame For Concatenation:

data.loc[:,2] # inner level selection a2, c2, d2

Replacing a Single Value

You might also like