0% found this document useful (0 votes)

9 views

Data Wrangling- Jupyter Notebook

The document is a Jupyter Notebook that demonstrates data exploration and manipulation using the pandas library in Python. It includes creating DataFrames, handling missing values, filtering data, merging DataFrames, and removing duplicates. Various operations are performed on student and car sales data, showcasing techniques like grouping and mapping for data wrangling.

Uploaded by

amitdhoundiyal2810

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Data Wrangling- Jupyter Notebook

Uploaded by

amitdhoundiyal2810

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [7]:

# Data exploration, here we assign the data, and then we visualize the data in a tabular format.

# Import pandas package

import pandas as pd

# Assign data
data = {'Name': ['Jai', 'Princi', 'Gaurav',
'Anuj', 'Ravi', 'Natasha', 'Riya'],
'Age': [17, 17, 18, 17, 18, 17, 17],
'Gender': ['M', 'F', 'M', 'M', 'M', 'F', 'F'],
'Marks': [90, 76, 'NaN', 74, 65, 'NaN', 71]}

# Convert into DataFrame

df = pd.DataFrame(data)

# Display data
print(df)

Name Age Gender Marks

0 Jai 17 M 90
1 Princi 17 F 76
2 Gaurav 18 M NaN
3 Anuj 17 M 74
4 Ravi 18 M 65
5 Natasha 17 F NaN
6 Riya 17 F 71

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 1/5
2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [23]:

# Compute average
c = avg = 0
for ele in df["Marks"]:
if str(ele).isnumeric():
c += 1
avg += ele
avg /= c

# Replace missing values

df = df.replace(to_replace="NaN",
value=avg)

# Display data
print(df)

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3628 try:
-> 3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Marks'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)

~\AppData\Local\Temp\ipykernel_103800\4025425810.py in <module>
1 # Compute average
2 #c = avg = 0
----> 3 for ele in df["Marks"]:
4 if str(ele).isnumeric():
5 c += 1

~\anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)

3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)

3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:
-> 3631 raise KeyError(key) from err
3632 except TypeError:
3633 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'Marks'

In [13]:

# Categorize gender
df['Gender'] = df['Gender'].map({'M': 0,
'F': 1, }).astype(float)

# Display data
print(df)

Name Age Gender Marks

0 Jai 17 NaN 90.0
1 Princi 17 NaN 76.0
2 Gaurav 18 NaN 75.2
3 Anuj 17 NaN 74.0
4 Ravi 18 NaN 65.0
5 Natasha 17 NaN 75.2
6 Riya 17 NaN 71.0

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 2/5
2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [14]:

# Filter top scoring students

df = df[df['Marks'] >= 75]

# Remove age row

df = df.drop(['Age'], axis=1)

# Display data
print(df)

Name Gender Marks

0 Jai NaN 90.0
1 Princi NaN 76.0
2 Gaurav NaN 75.2
5 Natasha NaN 75.2

In [15]:

# Wrangling Data Using Merge Operation

# Merge operation is used to merge raw data and into the desired format.
# Syntax for merging pd.merge( data_frame1,data_frame2, on="field ")

# import module
import pandas as pd

# creating DataFrame for Student Details

details = pd.DataFrame({
'ID': [101, 102, 103, 104, 105, 106,
107, 108, 109, 110],
'NAME': ['Jagroop', 'Praveen', 'Harjot',
'Pooja', 'Rahul', 'Nikita',
'Saurabh', 'Ayush', 'Dolly', "Mohit"],
'BRANCH': ['CSE', 'CSE', 'CSE', 'CSE', 'CSE',
'CSE', 'CSE', 'CSE', 'CSE', 'CSE']})

# printing details
print(details)

ID NAME BRANCH
0 101 Jagroop CSE
1 102 Praveen CSE
2 103 Harjot CSE
3 104 Pooja CSE
4 105 Rahul CSE
5 106 Nikita CSE
6 107 Saurabh CSE
7 108 Ayush CSE
8 109 Dolly CSE
9 110 Mohit CSE

In [16]:

# Import module
import pandas as pd

# Creating Dataframe for Fees_Status

fees_status = pd.DataFrame(
{'ID': [101, 102, 103, 104, 105,
106, 107, 108, 109, 110],
'PENDING': ['5000', '250', 'NIL',
'9000', '15000', 'NIL',
'4500', '1800', '250', 'NIL']})

# Printing fees_status
print(fees_status)

ID PENDING
0 101 5000
1 102 250
2 103 NIL
3 104 9000
4 105 15000
5 106 NIL
6 107 4500
7 108 1800
8 109 250
9 110 NIL

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 3/5
2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [17]:

# WRANGLING DATA USING MERGE OPERATION:

# Creating Dataframe
details = pd.DataFrame({
'ID': [101, 102, 103, 104, 105,
106, 107, 108, 109, 110],
'NAME': ['Jagroop', 'Praveen', 'Harjot',
'Pooja', 'Rahul', 'Nikita',
'Saurabh', 'Ayush', 'Dolly', "Mohit"],
'BRANCH': ['CSE', 'CSE', 'CSE', 'CSE', 'CSE',
'CSE', 'CSE', 'CSE', 'CSE', 'CSE']})

# Creating Dataframe
fees_status = pd.DataFrame(
{'ID': [101, 102, 103, 104, 105,
106, 107, 108, 109, 110],
'PENDING': ['5000', '250', 'NIL',
'9000', '15000', 'NIL',
'4500', '1800', '250', 'NIL']})

# Merging Dataframe
print(pd.merge(details, fees_status, on='ID'))

ID NAME BRANCH PENDING

0 101 Jagroop CSE 5000
1 102 Praveen CSE 250
2 103 Harjot CSE NIL
3 104 Pooja CSE 9000
4 105 Rahul CSE 15000
5 106 Nikita CSE NIL
6 107 Saurabh CSE 4500
7 108 Ayush CSE 1800
8 109 Dolly CSE 250
9 110 Mohit CSE NIL

In [18]:

# wrangling data using grouping method

# Creating Data
car_selling_data = {'Brand': ['Maruti', 'Maruti', 'Maruti',
'Maruti', 'Hyundai', 'Hyundai',
'Toyota', 'Mahindra', 'Mahindra',
'Ford', 'Toyota', 'Ford'],
'Year': [2010, 2011, 2009, 2013,
2010, 2011, 2011, 2010,
2013, 2010, 2010, 2011],
'Sold': [6, 7, 9, 8, 3, 5,
2, 8, 7, 2, 4, 2]}

# Creating Dataframe of car_selling_data

df = pd.DataFrame(car_selling_data)
print(df)

Brand Year Sold

0 Maruti 2010 6
1 Maruti 2011 7
2 Maruti 2009 9
3 Maruti 2013 8
4 Hyundai 2010 3
5 Hyundai 2011 5
6 Toyota 2011 2
7 Mahindra 2010 8
8 Mahindra 2013 7
9 Ford 2010 2
10 Toyota 2010 4
11 Ford 2011 2

In [19]:

# Group the data when year = 2010

grouped = df.groupby('Year')
print(grouped.get_group(2010))

Brand Year Sold

0 Maruti 2010 6
4 Hyundai 2010 3
7 Mahindra 2010 8
9 Ford 2010 2
10 Toyota 2010 4

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 4/5
2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [20]:

# Wrangling data by removing Duplication

# DataFrame.duplicated(subset=None, keep='first')
# Initializing Data
student_data = {'Name': ['Amit', 'Praveen', 'Jagroop',
'Rahul', 'Vishal', 'Suraj',
'Rishab', 'Satyapal', 'Amit',
'Rahul', 'Praveen', 'Amit'],

'Roll_no': [23, 54, 29, 36, 59, 38,

12, 45, 34, 36, 54, 23],

'Email': ['[email protected]', '[email protected]',

'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]']}

# Creating Dataframe of Data

df = pd.DataFrame(student_data)

# Printing Dataframe
print(df)

Name Roll_no Email

0 Amit 23 [email protected]
1 Praveen 54 [email protected]
2 Jagroop 29 [email protected]
3 Rahul 36 [email protected]
4 Vishal 59 [email protected]
5 Suraj 38 [email protected]
6 Rishab 12 [email protected]
7 Satyapal 45 [email protected]
8 Amit 34 [email protected]
9 Rahul 36 [email protected]
10 Praveen 54 [email protected]
11 Amit 23 [email protected]

In [21]:

# Here df.duplicated() list duplicate Entries in ROllno.

# So that ~(NOT) is placed in order to get non duplicate values.
non_duplicate = df[~df.duplicated('Roll_no')]

# printing non-duplicate values

print(non_duplicate)

Name Roll_no Email

In [ ]:

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 5/5

VW Polo - VW Polo 9N Module Coding PDF
33% (3)
VW Polo - VW Polo 9N Module Coding PDF
8 pages
Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Locksmith Manual
No ratings yet
Locksmith Manual
19 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Kawasaki Bajaj CT 100 Service Manual
100% (4)
Kawasaki Bajaj CT 100 Service Manual
2 pages
Case Study
No ratings yet
Case Study
17 pages
Data Wrangling
No ratings yet
Data Wrangling
5 pages
Vantika Kamra's Practical File 12 Diamond (26600872)
No ratings yet
Vantika Kamra's Practical File 12 Diamond (26600872)
46 pages
EXP-3
No ratings yet
EXP-3
10 pages
12 Pandas
100% (1)
12 Pandas
21 pages
B "Hello, World!" Print (B (2:5) ) Llo
No ratings yet
B "Hello, World!" Print (B (2:5) ) Llo
52 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
List of Practical Ip065 Xii Session 2025 Ckc Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 Ckc Academy
19 pages
PDF&Rendition=1
No ratings yet
PDF&Rendition=1
47 pages
TAMIL
No ratings yet
TAMIL
9 pages
GR12 RECORD PROGRAMS 6TH ONWARDS
No ratings yet
GR12 RECORD PROGRAMS 6TH ONWARDS
18 pages
Advance Operations On Dataframes: Create A Dataframe With Following Values
No ratings yet
Advance Operations On Dataframes: Create A Dataframe With Following Values
3 pages
Document (4)
No ratings yet
Document (4)
15 pages
Week 5 LAB
No ratings yet
Week 5 LAB
23 pages
vertopal.com_12_Pandas
No ratings yet
vertopal.com_12_Pandas
14 pages
DHP Journal
No ratings yet
DHP Journal
29 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
DSBDAL
No ratings yet
DSBDAL
87 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
ML lab manual 1-10
No ratings yet
ML lab manual 1-10
58 pages
LIST OF PRACTICAL IP065 XII SESSION 2025 CKC ACADEMY
No ratings yet
LIST OF PRACTICAL IP065 XII SESSION 2025 CKC ACADEMY
19 pages
a5
No ratings yet
a5
28 pages
Part A Assignment_No_1
No ratings yet
Part A Assignment_No_1
7 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Chapter Notes - Data Handling Using Pandas DataFrame
No ratings yet
Chapter Notes - Data Handling Using Pandas DataFrame
16 pages
Notebook PYTHON DATA SCIENCE
No ratings yet
Notebook PYTHON DATA SCIENCE
16 pages
c
No ratings yet
c
5 pages
Pandas 2 Complete Notes Class XII
No ratings yet
Pandas 2 Complete Notes Class XII
18 pages
Lab2.2 Kritika
No ratings yet
Lab2.2 Kritika
10 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
20 pages
Practical File ANKIT RAJ CLASS 12-F
No ratings yet
Practical File ANKIT RAJ CLASS 12-F
48 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
1723524625270_Data_Frame_Notes3
No ratings yet
1723524625270_Data_Frame_Notes3
39 pages
02. Python Pandas - 2 2020-21
No ratings yet
02. Python Pandas - 2 2020-21
21 pages
Series 1
No ratings yet
Series 1
408 pages
python interviews
No ratings yet
python interviews
154 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
No ratings yet
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
65 pages
Document (4)-1
No ratings yet
Document (4)-1
15 pages
Python Unit 4&5 Que
No ratings yet
Python Unit 4&5 Que
33 pages
EDA - Session-1 - Basic Dataframe Opertaions-1
No ratings yet
EDA - Session-1 - Basic Dataframe Opertaions-1
7 pages
Assignment 7
No ratings yet
Assignment 7
1 page
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
No ratings yet
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
9 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
dataframing_in_csv
No ratings yet
dataframing_in_csv
14 pages
Lab Programmes Adwaith
No ratings yet
Lab Programmes Adwaith
18 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
No Ph.D. Game Design With Three.js
From Everand
No Ph.D. Game Design With Three.js
Nikiforos Kontopoulos
No ratings yet
Unit_5_Graphs_Trees
No ratings yet
Unit_5_Graphs_Trees
69 pages
Python Pandas - DataFrame
No ratings yet
Python Pandas - DataFrame
12 pages
Unit 1_DE
No ratings yet
Unit 1_DE
44 pages
Introduction to Data Structures
No ratings yet
Introduction to Data Structures
3 pages
Data Loading- Jupyter Notebook
No ratings yet
Data Loading- Jupyter Notebook
15 pages
E_Health_Campaigns_Assignment
No ratings yet
E_Health_Campaigns_Assignment
14 pages
About Blockchain Technology
No ratings yet
About Blockchain Technology
10 pages
AQI- EVS practical file (1)
No ratings yet
AQI- EVS practical file (1)
5 pages
MATHEMATICS MCQs
No ratings yet
MATHEMATICS MCQs
6 pages
Price List Wef 16th Jan 2024
No ratings yet
Price List Wef 16th Jan 2024
2 pages
test data 02-2025
No ratings yet
test data 02-2025
30 pages
Report Sales Periode 2024-05-31 To 2024-05-31 (Latest Download 31-05-2024)
No ratings yet
Report Sales Periode 2024-05-31 To 2024-05-31 (Latest Download 31-05-2024)
20 pages
Others P281-288 (2014)
No ratings yet
Others P281-288 (2014)
8 pages
The International Engine of The Year Awards
No ratings yet
The International Engine of The Year Awards
5 pages
MBI Wiki
No ratings yet
MBI Wiki
6 pages
M032-Car Cost Update
No ratings yet
M032-Car Cost Update
5 pages
Jaguar Xe: Xe Model 2019 Trademark Logo
No ratings yet
Jaguar Xe: Xe Model 2019 Trademark Logo
4 pages
Coils Cat 0838
No ratings yet
Coils Cat 0838
78 pages
Nissan X-Trail Impul
No ratings yet
Nissan X-Trail Impul
9 pages
Myvi Price - Google Search
No ratings yet
Myvi Price - Google Search
1 page
Auto Express - September 18, 2024 UK
No ratings yet
Auto Express - September 18, 2024 UK
84 pages
All About Cars
No ratings yet
All About Cars
16 pages
Engine Repair Kits Catalogue 2016 PDF
100% (2)
Engine Repair Kits Catalogue 2016 PDF
252 pages
Lamborghini S Success Story
No ratings yet
Lamborghini S Success Story
6 pages
Wasim
No ratings yet
Wasim
16 pages
Usa Atk Katalog
No ratings yet
Usa Atk Katalog
199 pages
List Supported Upa
No ratings yet
List Supported Upa
7 pages
Phụ Tùng A Ngầu: 1/ Lọc nhớt
No ratings yet
Phụ Tùng A Ngầu: 1/ Lọc nhớt
2 pages
Suzuki Swift
No ratings yet
Suzuki Swift
3 pages
Case Study of Tata Indica
67% (3)
Case Study of Tata Indica
17 pages
Reading-8I.2_Bui-Quang-Huy-C4
No ratings yet
Reading-8I.2_Bui-Quang-Huy-C4
1 page
ΥΔΡΑΡΓΥΡΟΣ
No ratings yet
ΥΔΡΑΡΓΥΡΟΣ
10 pages
DG_4WAGENT_ONEYEAR_QUOTE_D189149452_D189149452_1740140471587 (2)
No ratings yet
DG_4WAGENT_ONEYEAR_QUOTE_D189149452_D189149452_1740140471587 (2)
3 pages
XJS 3.6 Parts Clutch
No ratings yet
XJS 3.6 Parts Clutch
6 pages
HELIX - Tabela de Lubrificação
No ratings yet
HELIX - Tabela de Lubrificação
1 page

Data Wrangling- Jupyter Notebook

Uploaded by

Data Wrangling- Jupyter Notebook

Uploaded by

2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

# Import pandas package

# Convert into DataFrame

Name Age Gender Marks

# Replace missing values

KeyError Traceback (most recent call last)

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)

Name Age Gender Marks

# Filter top scoring students

# Remove age row

Name Gender Marks

# Wrangling Data Using Merge Operation

# creating DataFrame for Student Details

# Creating Dataframe for Fees_Status

# WRANGLING DATA USING MERGE OPERATION:

ID NAME BRANCH PENDING

# wrangling data using grouping method

# Creating Dataframe of car_selling_data

Brand Year Sold

# Group the data when year = 2010

Brand Year Sold

# Wrangling data by removing Duplication

'Roll_no': [23, 54, 29, 36, 59, 38,

'Email': ['[email protected]', '[email protected]',

# Creating Dataframe of Data

Name Roll_no Email

# Here df.duplicated() list duplicate Entries in ROllno.

# printing non-duplicate values

Name Roll_no Email

You might also like

~\anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)