Core of ML - Part 1 Handling Data

Uploaded by

yogini.prabhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views3 pages

Core of ML - Part 1 Handling Data

Uploaded by

yogini.prabhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Few data science projects are exempt from the necessity of cleaning data.

Data cleaning encompasses the initial steps of

preparing data. Its speciﬁc purpose is that only the relevant and useful information underlying the data is retained, be it for
its posterior analysis, to use as inputs to an AI or machine learning model, and so on. Unifying or converting data types,
dealing with missing values, eliminating noisy values stemming from erroneous measurements, and removing duplicates
are some examples of typical processes within the data cleaning stage.

As you might think, the more complex the data, the more intricate, tedious, and time-consuming the data cleaning can
become, especially when implementing it manually.

This article delves into the functionalities oﬀered by the Pandas library to automate the process of cleaning data.
Handling Data

Automating data cleaning processes with pandas boils down to systematizing the combined, sequential application of several data
cleaning functions to encapsulate the sequence of actions into a single data cleaning pipeline. Before doing this, let’s introduce
some typically used pandas functions for diverse data cleaning steps. In the sequel, we assume an example python variable df
that contains a dataset encapsulated in a pandas DataFrame object.

● Filling missing values: pandas provides methods for automatically dealing with missing values in a dataset, be it by replacing
missing values with a “default” value using the df.fillna() method, or by removing any rows or columns containing
missing values through the df.dropna() method.

● Removing duplicated instances: automatically removing duplicate entries (rows) in a dataset could not be easier thanks to
the df.drop_duplicates() method, which allows the removal of extra instances when either a speciﬁc attribute value or
the entire instance values are duplicated to another entry.
Representing Data
● Manipulating strings: some pandas functions are useful to make the format of string attributes
uniform. For instance, if there is a mix of lowercase, sentence case, and uppercase values for an
'column' attribute and we want them all to be lowercase, the
df['column'].str.lower()method does the job.

For removing accidentally introduced leading and trailing whitespaces, try the
df['column'].str.strip() method.

● Manipulating date and time: the pd.to_datetime(df['column']) converts string columns

containing date-time information, e.g. in the dd/mm/yyyy format, into Python datetime objects,
thereby easing their further manipulation.

● Column renaming: automating the process of renaming columns can be particularly useful when
there are multiple datasets seggregated by city, region, project, etc., and we want to add prefixes or
suffixes to all or some of their columns for easing their identification. The
df.rename(columns={old_name: new_name}) method makes this possible.

Python For Analytics - 2025 - 2020
No ratings yet
Python For Analytics - 2025 - 2020
28 pages
Data Cleaning & Preparation
100% (2)
Data Cleaning & Preparation
2 pages
E-Book Data Cleaning Techniques in Python
100% (2)
E-Book Data Cleaning Techniques in Python
50 pages
Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Data Preprocessing
No ratings yet
Data Preprocessing
84 pages
Cleaning Data in Python: Pu!ing It All Together
No ratings yet
Cleaning Data in Python: Pu!ing It All Together
14 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
PDS Exp 7 To 9
No ratings yet
PDS Exp 7 To 9
10 pages
6.data Cleaning
No ratings yet
6.data Cleaning
20 pages
S08 Slides
No ratings yet
S08 Slides
14 pages
Master Data Cleaning With Python
No ratings yet
Master Data Cleaning With Python
11 pages
Determining The Use of Open or Enclosed Lineshaft
No ratings yet
Determining The Use of Open or Enclosed Lineshaft
9 pages
2.1 Combining Data Frames
No ratings yet
2.1 Combining Data Frames
38 pages
ch4 Slides PDF
No ratings yet
ch4 Slides PDF
44 pages
UQ21CA632B - Unit2 - Class12 - Cleaning Tools-Pandas&Inspect and Organize Data
No ratings yet
UQ21CA632B - Unit2 - Class12 - Cleaning Tools-Pandas&Inspect and Organize Data
12 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Pandas Data Cleaning Presentation
No ratings yet
Pandas Data Cleaning Presentation
11 pages
Data Cleaning
No ratings yet
Data Cleaning
20 pages
Data Sciene File
No ratings yet
Data Sciene File
36 pages
Evolutionary Computing Slides
No ratings yet
Evolutionary Computing Slides
35 pages
Meg4 Mooring Line Certificate
No ratings yet
Meg4 Mooring Line Certificate
1 page
Day 10 Pandasdatacleaning
No ratings yet
Day 10 Pandasdatacleaning
6 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
Reading 5 - Data Preparation
No ratings yet
Reading 5 - Data Preparation
23 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
14 pages
Data Cleanups
No ratings yet
Data Cleanups
16 pages
SyamilFakhruddin - DS - Summary - Data Analysis
No ratings yet
SyamilFakhruddin - DS - Summary - Data Analysis
17 pages
Data Cleaning and Preparation
No ratings yet
Data Cleaning and Preparation
9 pages
Intro To Pandas For Data Analytics
No ratings yet
Intro To Pandas For Data Analytics
20 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Sheet Metal Forming PDF
100% (1)
Sheet Metal Forming PDF
9 pages
Module 3
No ratings yet
Module 3
20 pages
Giroud-Han Design Method - Development and Calibration: Jie Han, PH.D., PE Professor
No ratings yet
Giroud-Han Design Method - Development and Calibration: Jie Han, PH.D., PE Professor
44 pages
Prac 7
No ratings yet
Prac 7
5 pages
Cleaning Data in Python Live Session
No ratings yet
Cleaning Data in Python Live Session
23 pages
DAP Writeups - Merged
No ratings yet
DAP Writeups - Merged
33 pages
Pandas
No ratings yet
Pandas
13 pages
Ds Exp1 Manju
No ratings yet
Ds Exp1 Manju
5 pages
Application of Pandas
No ratings yet
Application of Pandas
11 pages
Practical 3
No ratings yet
Practical 3
2 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
Introduction To Pandas Programming 2
No ratings yet
Introduction To Pandas Programming 2
3 pages
El Motor No Arranca Localizacion y Solucion de Problemas
100% (1)
El Motor No Arranca Localizacion y Solucion de Problemas
3 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Mastering Pandas - Important Pandas Functions For Your Next Project
No ratings yet
Mastering Pandas - Important Pandas Functions For Your Next Project
5 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
Unit-2 Bda
No ratings yet
Unit-2 Bda
11 pages
Deep Learning Ram
No ratings yet
Deep Learning Ram
21 pages
What Is Data Cleaning
No ratings yet
What Is Data Cleaning
8 pages
DS Unit 2
No ratings yet
DS Unit 2
23 pages
Pandas 1
No ratings yet
Pandas 1
13 pages
Resume Praveen Sahu
No ratings yet
Resume Praveen Sahu
3 pages
2041682-b (Bolt Modeling)
No ratings yet
2041682-b (Bolt Modeling)
22 pages
Pandas Notes
No ratings yet
Pandas Notes
3 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
The Final Mill Report
100% (1)
The Final Mill Report
31 pages
Report
No ratings yet
Report
18 pages
12d20106a Prestressed Concrete
No ratings yet
12d20106a Prestressed Concrete
1 page
Life Cycle Assessment: A Product-Oriented Method For Sustainability Analysis
No ratings yet
Life Cycle Assessment: A Product-Oriented Method For Sustainability Analysis
13 pages
CRIT5 Resume
No ratings yet
CRIT5 Resume
4 pages
Sika® Viscocrete®-1003: Product Data Sheet
No ratings yet
Sika® Viscocrete®-1003: Product Data Sheet
3 pages
Datascience
No ratings yet
Datascience
26 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Mnmjec - Ec6303 Signals & Systems
No ratings yet
Mnmjec - Ec6303 Signals & Systems
25 pages
Smart SCM Modem
No ratings yet
Smart SCM Modem
76 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Analysis of The Fracture of Steel Reinforcing Bars Under Low Cycle Fatigue 0974 8369 1000285
No ratings yet
Analysis of The Fracture of Steel Reinforcing Bars Under Low Cycle Fatigue 0974 8369 1000285
3 pages
AS NZ Standard
No ratings yet
AS NZ Standard
10 pages
h06974 MC Series Icv
No ratings yet
h06974 MC Series Icv
2 pages
Data Science Workflow
No ratings yet
Data Science Workflow
7 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
2.1 Closed Loop SISO Control
No ratings yet
2.1 Closed Loop SISO Control
5 pages
6 Sigma
No ratings yet
6 Sigma
37 pages
DYNAPOL BADGE Free Alternativs
No ratings yet
DYNAPOL BADGE Free Alternativs
2 pages
Macdes Final Coaching Elements
No ratings yet
Macdes Final Coaching Elements
20 pages
Pile Foundations Ii - chp2
No ratings yet
Pile Foundations Ii - chp2
39 pages
Estimasi Oil Consumption
No ratings yet
Estimasi Oil Consumption
1 page
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas Data Manipulation Extended CheatSheet 1731972219
No ratings yet
Pandas Data Manipulation Extended CheatSheet 1731972219
9 pages
Trox DLQ-ZH-M - 600
No ratings yet
Trox DLQ-ZH-M - 600
1 page
Lecture Ten Surface and Interfacial Tensions
No ratings yet
Lecture Ten Surface and Interfacial Tensions
13 pages
20103000T10Z001K000
No ratings yet
20103000T10Z001K000
8 pages
Lmxs 27676 D
No ratings yet
Lmxs 27676 D
107 pages
E-301R - Reporte HTRI - V04
No ratings yet
E-301R - Reporte HTRI - V04
17 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Mitosis Lesson Plans Laurenwhite
No ratings yet
Mitosis Lesson Plans Laurenwhite
1 page
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet

Core of ML - Part 1 Handling Data

Uploaded by

Core of ML - Part 1 Handling Data

Uploaded by

Few data science projects are exempt from the necessity of cleaning data.

Data cleaning encompasses the initial steps of

● Manipulating date and time: the pd.to_datetime(df['column']) converts string columns

You might also like