Cleaning Data in Python Live Session
Cleaning Data in Python Live Session
ADEL NEHME
Content Developer
The data science workflow
Report or
Dashboard
Model in
Production
Report or
Dashboard
Model in
Production
4 Q&A
5 Our to do list
6 Data cleaning
7 Q&A
Notebook
Session outline
1 Introduction
4 Q&A
5 Our to do list
6 Data cleaning
7 Q&A
Report or
Dashboard
Model in
Production
Report or
Dashboard
Model in
Production
Report or
Dashboard
Model in
Production
Report or
Dashboard
Model in
Production
Register here
DCVirtual: Webinar week 🎉
DataCamp for Enterprise: What’s New in Q2 2020 🌅
Register here
Take home question
Submission details:
● Share with us a code snippet with your output on LinkedIn, Twitter or Facebook
● Tag us on `@DataCamp` with the hashtag `#datacamplive`
Recap of the functions used
Diagnosis functions Description Treatment functions Description
import pandas as pd Imports the pandas package with the alias pd Replaces one string with another for each row of a str
.str.replace(“”, “”) column
.head() Prints the header of a DataFrame
.str.split(“”, expand = True) Splits a string column into two based on input
.info() Returns a # observations, data types and missing pd.to_datetime() Converts a date column to datetime
values per column
.str.lower() Lowercases each row in a str column
.describe() Returns statistical distribution of numeric value
in a DataFrame
.str.strip(“”) Removes a pattern from each row of an str column
.isna().sum() Returns # of missing values per column
.replace() Replace values for others in a column
sns.distplot() Plots distribution of one variable
.fillna() Fills missing values of a column with a value of your
choice
msno.matrix() Visualizes missingness matrix
.drop_duplicates() Drops duplicates
msno.barplot() Visualizes missingness barplot