0% found this document useful (0 votes)

13 views119 pages

Data Cleaning

Uploaded by

taghreed albaiz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views119 pages

Data Cleaning

Uploaded by

taghreed albaiz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 119

Data Cleaning

Objectives
Develop data cleaning strategies:

● Handling missing values

● Tidying string data

● Cleaning datasets through

case studies
What is
Data Cleaning?
Data Analysis Workflow
What does the data analysis
workflow look like?
Data Analysis Workflow
What does the data analysis
workflow look like?

START WITH A
QUESTION
Data Analysis Workﬂow
What does the data analysis
workﬂow look like?

START WITH A
QUESTION

COLLECT &
CLEAN DATA
Data Analysis Workﬂow
What does the data analysis
workﬂow look like?

START WITH A
QUESTION

EXPLORATORY
DATA ANALYSIS
(EDA)

COLLECT &
CLEAN DATA
Data Analysis Workﬂow
What does the data analysis
workﬂow look like?

START WITH A MODELS &

QUESTION ALGORITHMS

EXPLORATORY
DATA ANALYSIS
(EDA)

COLLECT & COMMUNICATE

CLEAN DATA RESULTS
Data Analysis Workﬂow
What does the data analysis
workﬂow look like?

START WITH A MODELS &

QUESTION ALGORITHMS

EXPLORATORY
DATA ANALYSIS
(EDA)

COLLECT & COMMUNICATE

CLEAN DATA RESULTS
Data Analysis Workﬂow
What does the data analysis
workﬂow look like?

START WITH A MODELS &

QUESTION ALGORITHMS

EXPLORATORY
DATA ANALYSIS
(EDA)

COLLECT & COMMUNICATE

CLEAN DATA RESULTS
How can data be messy?
How can data be messy?

1. Duplicate or
unnecessary data

2. Inconsistent text and

typos

3. Missing data

4. Outliers

… and more!
How can data be messy?
Most Visited US Websites (as of 2020)

rank website monthly_trafﬁc

1. Duplicate or
1 youtube.com 1,626,000,000
unnecessary data
2 en.wikipedia.org 1,032,000,000

2. Inconsistent text and 2 en.wikipedia.org 1,032,000,000

typos 3 twitter.com 536,000,000

3. Missing data 4 Facebook 512,000,000

5 amazon.com 492 million

4. Outliers
6 yelp.com ---

… and more! 7 reddit.com 184,000,000

36 netﬂix.com 37,000,000
How can data be messy?
Most Visited US Websites (as of 2020)

rank website monthly_trafﬁc

1. Duplicate or
1 youtube.com 1,626,000,000
unnecessary data
2 en.wikipedia.org 1,032,000,000

2. Inconsistent text and 2 en.wikipedia.org 1,032,000,000

typos 3 twitter.com 536,000,000

3. Missing data 4 Facebook 512,000,000

5 amazon.com 492 million

4. Outliers
6 yelp.com ---

… and more! 7 reddit.com 184,000,000

36 netﬂix.com 37,000,000
How can data be messy?
Most Visited US Websites (as of 2020)

rank website monthly_trafﬁc

1. Duplicate or
1 youtube.com 1,626,000,000
unnecessary data
2 en.wikipedia.org 1,032,000,000

2. Inconsistent text and 2 en.wikipedia.org 1,032,000,000

typos 3 twitter.com 536,000,000

3. Missing data 4 Facebook 512,000,000

5 amazon.com 492 million

4. Outliers
6 yelp.com ---

… and more! 7 reddit.com 184,000,000

36 netﬂix.com 37,000,000
How can data be messy?
● Look for duplicates and dig into why
there are multiple values
1. Duplicate or
unnecessary data ● Filter data down as appropriate

2. Inconsistent text and

typos

3. Missing data

4. Outliers

… and more!
How can data be messy?
Most Visited US Websites (as of 2020)

rank website monthly_trafﬁc

1. Duplicate or
1 youtube.com 1,626,000,000
unnecessary data
2 en.wikipedia.org 1,032,000,000

2. Inconsistent text and 3 twitter.com 536,000,000

typos 4 Facebook 512,000,000

3. Missing data 5 amazon.com 492 million

6 yelp.com ---
4. Outliers
7 reddit.com 184,000,000

… and more!
How can data be messy?
Most Visited US Websites (as of 2020)

rank website monthly_trafﬁc

1. Duplicate or
1 youtube.com 1,626,000,000
unnecessary data
2 en.wikipedia.org 1,032,000,000

2. Inconsistent text and 3 twitter.com 536,000,000

typos 4 Facebook 512,000,000

3. Missing data 5 amazon.com 492 million

6 yelp.com ---
4. Outliers
7 reddit.com 184,000,000

… and more!
How can data be messy?
Check summary statistics for each
column of data.
1. Duplicate or
unnecessary data
● Minimum and maximum of
2. Inconsistent text and numerical values
typos
● Unique values of categoricals
3. Missing data

4. Outliers

… and more!
How can data be messy?
Most Visited US Websites (as of 2020)

rank website monthly_trafﬁc

1. Duplicate or
1 youtube.com 1,626,000,000
unnecessary data
2 en.wikipedia.org 1,032,000,000

2. Inconsistent text and 3 twitter.com 536,000,000

typos 4 facebook.com 512,000,000

3. Missing data 5 amazon.com 492,000,000

6 yelp.com ---
4. Outliers
7 reddit.com 184,000,000

… and more!
How can data be messy?
Most Visited US Websites (as of 2020)

rank website monthly_trafﬁc

1. Duplicate or
1 youtube.com 1,626,000,000
unnecessary data
2 en.wikipedia.org 1,032,000,000

2. Inconsistent text and 3 twitter.com 536,000,000

typos 4 facebook.com 512,000,000

3. Missing data 5 amazon.com 492,000,000

6 yelp.com ---
4. Outliers
7 reddit.com 184,000,000

… and more!
How can data be messy?
Outliers
1. Duplicate or
● Are distant from other observations
unnecessary data

2. Inconsistent text and ● Do not accurately represent real world

typos
● Can signiﬁcantly impact analysis
3. Missing data

4. Outliers

… and more!
How can data be messy?
How to ﬁnd outliers:
1. Duplicate or ● Plots
unnecessary data
● Statistics
2. Inconsistent text and
typos

3. Missing data

4. Outliers

… and more!
How can data be messy?
How to ﬁnd outliers:
1. Duplicate or ● Plots
unnecessary data
● Statistics
2. Inconsistent text and
typos
How to deal with outliers:
3. Missing data
● Remove them
4. Outliers
● Assign mean or median value
… and more!
● Predict value with model
How can data be messy?

1. Duplicate or
unnecessary data

2. Inconsistent text and

typos

3. Missing data

4. Outliers

… and more!
Data Analysis Workﬂow
What does the data analysis
workﬂow look like?

START WITH A MODELS &

QUESTION ALGORITHMS

EXPLORATORY
DATA ANALYSIS
(EDA)

COLLECT & COMMUNICATE

CLEAN DATA RESULTS
Handling
Missing Values
Missing Values

● Unfortunately very common

● Occur for many reasons

● Detect with pandas

● Several ways to handle missings

Detecting Missing Values
Load in data about coffee

import pandas as pd
df = pd.read_csv("coffee.csv")
df
Detecting Missing Values
Dataset contains several missings

import pandas as pd
df = pd.read_csv("coffee.csv")
df
Detecting Missing Values
Quickly check missings with .info()

df.info()
Detecting Missing Values
Use .isna() for elementwise
True/False values

df.isna()
Detecting Missing Values
Use .isna() for elementwise
True/False values

df.isna()
How to Detect Missing Values
Use .isna() result as data mask

~df.shipping.isna()
How to Detect Missing Values
Use .isna() result as data mask

~df.shipping.isna() df[~df.shipping.isna()]
Methods to Handle Missings

1. Drop rows with missing values

2. Fill missing values with a

standard value such as zero

3. Impute missings with mean or

median
Dropping Missing Values
Use pandas to drop with .dropna()
● Drops all rows with any missing by
default

df.dropna()
Dropping Missing Values
Use pandas to drop with .dropna()
● Drops all rows with any missing by
default
● Use subset to drop only some missings

df.dropna(subset=["price_lb"])
Filling Missings with a Value
Use pandas .fillna() to ﬁll missings

df.shipping.fillna(0)
Filling Missings with a Value
Use pandas .fillna() to ﬁll missings
● Only ﬁll with reasonable values!

df.shipping.fillna(0)
Imputing Missings
Use pandas .fillna() to ﬁll missings
with mean or median values

price_avg = df.price_lb.mean()
price_avg

14.105
Imputing Missings
Use pandas .fillna() to ﬁll missings
with mean or median values

price_avg = df.price_lb.mean()
df.price_lb.fillna(price_avg)
Detecting Missing Values

.info()
● Count of non-null values for each
column

.isna()
● Boolean True/False for each
element
● Can be used as data mask
Methods to Handle Missings

.dropna()
1. Drop rows with missing values

.fillna()
2. Fill missing values with a
standard value such as zero
3. Impute missings with mean or
median

4. Use a model to predict missing

(advanced)
Managing
Columns of Data
Rename Data Columns
Load in data about coffee

import pandas as pd
df = pd.read_csv("coffee_data.csv")
df
Rename Data Columns
None of these column names are
valid Python variables
● Rename to make analysis easier

df.columns

Index(['Price per Pound', 'Shipping

Price', 'Favorite?'], dtype='object')
Rename Data Columns
Pass a dictionary to the columns
argument of pandas .rename()

df.rename(columns={
'Price per Pound': 'price_lb',
'Shipping Price': 'shipping',
'Favorite?': 'favorite'
}, inplace=True)
Rename Data Columns
Pass a dictionary to the columns
argument of pandas .rename()

df.rename(columns={ df
'Price per Pound': 'price_lb',
'Shipping Price': 'shipping',
'Favorite?': 'favorite'
}, inplace=True)
What is the average shipping?
Why does using .mean() on
shipping column causes error?

df.shipping.mean()

TypeError: Could not convert

3.000.001.995.490.004.002.50 to numeric
What is the average shipping?
Why does using .mean() on
shipping column causes error?

df.shipping.mean()

TypeError: Could not convert

3.000.001.995.490.004.002.50 to numeric

df.head(3)
What is the average shipping?
Checking .dtypes shows the
shipping column contains strings

df.dtypes
Updating a Column Datatype
Convert a column’s datatype with
the .astype() method
df['shipping'] = df.shipping.astype('float')
Updating a Column Datatype
Convert a column’s datatype with
the .astype() method
df['shipping'] = df.shipping.astype('float')
df.dtypes
Updating a Column Datatype
Convert a column’s datatype with
the .astype() method
df['shipping'] = df.shipping.astype('float')
df.dtypes

df.shipping.mean()

2.4257142857142857
Dropping Columns
The favorite column contains the
same value for every row
df.favorite.value_counts()
Dropping Columns
Drop unnecessary columns with
pandas .drop()
● axis=0 refers to the row dimension
● axis=1 refers to column dimension
df.drop('favorite', axis=1)
Dropping Columns
Drop unnecessary columns with
pandas .drop()
● axis=0 refers to the row dimension
● axis=1 refers to column dimension
df.drop('favorite', axis=1)
Managing Columns of Data

● Rename columns by passing an

update dictionary into .rename()

● Convert column’s datatype with

.astype()

● Use .drop() and axis=1 to drop a

column from the dataframe
Cleaning
String Data
Analyzing Text Data

Text data is notoriously messy.

● Inconsistent text

● Typos

● Extra whitespace

● Extra characters in numerical

values (e.g. commas, dollar signs)
Analyzing Text Data
Load in data about US cities

import pandas as pd
df = pd.read_csv("cities.csv")
df
Convert Column to Upper- or
Lowercase
● Inconsistencies in state column

df
Convert Column to Upper- or
Lowercase
● Inconsistencies in state column
● Convert column to uppercase
● Reference string methods, .str

df.state = df.state.str.upper()
df.state
Remove Speciﬁc Characters

● Commas in population column

df
Remove Speciﬁc Characters

● Commas in population column

● Remove by replacing commas
with the empty string
pop = df.population.str.replace(',', '')
pop
Remove Speciﬁc Characters

● Commas in population column

● Remove by replacing commas
with the empty string
pop = df.population.str.replace(',', '')
pop pop.astype('int')
Analyzing Text Data
df
Analyzing Text Data
df

df.city.unique()

array(['Chicago ', 'Los Angeles ',

'Omaha ', 'Dallas ', 'Philadelphia ',
'Los Alamos '], dtype=object)
Removing Whitespace
Strip whitespace from front or end
of string data
● Whitespace includes spaces, tabs,
new line characters, etc.

city = df.city.str.strip()
Removing Whitespace
Strip whitespace from front or end
of string data
● Whitespace includes spaces, tabs,
new line characters, etc.

city = df.city.str.strip()
city.unique()

array(['Chicago', 'Los Angeles',

'Omaha', 'Dallas', 'Philadelphia',
'Los Alamos'], dtype=object)
Checking for Substrings
Which cities contain “Los”?
● Check elementwise with
.contains()

df.city.str.contains('Los')
Checking for Substrings
Which cities contain “Los”?
● Check elementwise with
.contains()
● Use result as data mask

df.city.str.contains('Los') df[df.city.str.contains('Los')]
Analyzing Text Data
Text data is notoriously messy.
Inconsistent text or typos
● .str.upper(), .str.lower()
● .str.replace()

Extra whitespace
● .str.strip()

Characters in numerical values

● .str.replace()

Searching for substrings

● .str.contains()
Case Study #1:
Exploring New Data
What should we do when
exploring new data?
Ask many questions and be skeptical.

● How do these data help answer the

project question?

● What kind of data is given in each

column?

● Do the data contain missing values?

● What steps are necessary to make

these data ready for analysis?
Stock Prices Case Study
Which Monday time saw the
highest stock price in September?
Stock Prices Case Study
What kind of data do we have?

import pandas as pd
df = pd.read_csv("wxyz.csv")
df.head()
Stock Prices Case Study
What kind of data do we have?

df.shape
(308, 3)

df.info()
Stock Prices Case Study
What kind of data do we have?

df.shape
(308, 3)

df.info()

All
columns
contain
strings
Convert Strings to Numerics
Convert the price column to
numerical values

df.head()
Convert Strings to Numerics
Convert the price column to
numerical values
● Remove dollar signs

df.price = df.price.str.replace('$', '')

df.head()
Convert Strings to Numerics
Convert the price column to
numerical values
● Remove dollar signs
● Convert from strings to ﬂoats

df.price = df.price.astype('float')
df.dtypes
Stock Prices Case Study
Which Monday time saw the
highest stock price in September?

df.sample(5)
Create Datetime Column
● Combine day and time columns

df['date_time'] = df.day + ' ' + df.time

df.head()
Create Datetime Column
● Combine day and time columns
● Convert to datetime

df['date_time'] = df.day + ' ' + df.time df['date_time'] = pd.to_datetime(df.date_time)

df.head() df.dtypes
Create Day of Week Column
Use weekday property of datetime

df['day_of_week'] = df.date_time.dt.weekday
df.sample(5)
Stock Prices Case Study
Which Monday time saw the
highest stock price in September?
● Select Mondays

mondays = df[df.day_of_week == 0]
mondays.sample(5)
Stock Prices Case Study
Which Monday time saw the
highest stock price in September?
● Select Mondays
● Sort to ﬁnd the maximum price

(mondays[['date_time', 'price']]
.sort_values('price', ascending=False)
.head(3))
What should we do when
exploring new data?

.head(), .info(), .dtypes, .shape

● What kind of data is given in each
column?
● Do the data contain missing values?

.replace(), .to_datetime()
● What steps are necessary to make
data ready for analysis?

● How do these data help answer the

project question?
Case Study #2:
Diagnosing Errors
How can we diagnose errors?

Data inconsistencies may cause

errors when operating on columns.

Build custom functions to:

● Include print statements

● Add conditional statements

● Use exceptions
Circus Performers Case Study
Which type of performer is the
most experienced on average?

import pandas as pd
df = pd.read_csv("circus.csv")
df.head()
Circus Performers Case Study
Which type of performer is the
most experienced on average?
df.shape

(50, 2)

df.info()
Manipulating Email Data
Create new columns for domain

str.split("[email protected]", "@")

['jamie50', 'liontamer.org']

str.split("[email protected]", "@")[1]

'liontamer.org’
Manipulating Email Data
Create new columns for domain

df.loc[:2].email.map(
lambda x: str.split(x, "@")[1]
)
Manipulating Email Data
Create new columns for domain

df.email.map(
lambda x: str.split(x, "@")[1]
)

IndexError: list index out of range

Creating Custom Function
Create a custom Python function
to diagnose error

def check_for_at_symbol(email):
if '@' not in email:
print(email)
return ('@' in email)

at_symbol_test = df.email.map(check_for_at_symbol)

jonathan104ringmaster.net
Creating Custom Function
Create a custom Python function
to create domain column
def get_domain(email):
if '@' not in email:
return None
return str.split(email, '@')[1]

df['domain'] = df.email.map(get_domain)
Creating Custom Function
Create a custom Python function
to create domain column
def get_domain(email):
if '@' not in email:
return None
return str.split(email, '@')[1]

df['domain'] = df.email.map(get_domain)
df.loc[13:16]
Using Python Exceptions
Create domain column by catching
errors with exception
def get_domain_exception(email):
try:
return str.split(email, '@')[1]
except IndexError:
print(email)
return None

df['domain'] = df.email.map(get_domain_exception)

jonathan104ringmaster.net
Circus Performers Case Study
Which type of performer is the
most experienced on average?
df.head(3)

df.shape

(50, 2)

df.email.nunique()

40
Remove Duplicate Rows
Use pandas to remove duplicate
rows with .drop_duplicates()
● Entire row must be exact match

df.drop_duplicates().shape()

(40, 3)
Remove Duplicate Rows
Use pandas to remove duplicate
rows with .drop_duplicates()
● Entire row must be exact match
● Use subset argument with column
name or list to match speciﬁc columns

df.drop_duplicates().shape()

(40, 3)

df.drop_duplicates(subset='email').shape()

(40, 3)
Circus Performers Case Study
Which type of performer is the
most experienced on average?
df.drop_duplicates(inplace=True)
(df.groupby('domain')
.performances
.mean()
.sort_values(ascending=False))
Circus Performers Case Study
Which type of performer is the
most experienced on average?
df.drop_duplicates(inplace=True)
(df.groupby('domain')
.performances
.mean()
.sort_values(ascending=False))
Case Study #3:
Comparing Against
Group Statistics
Penguins Case Study
Standardize the penguin masses
by species and sort by standard
mass ascending.

import seaborn as sns

df = sns.load_dataset("penguins")
df.head()
Penguins Case Study
Standardize the penguin masses
by species and sort by standard
mass ascending.

import seaborn as sns

df = sns.load_dataset("penguins")
df.head()
Penguins Case Study
What kind of data do we have?

df.shape
(344, 7)

df.info()
Handling Missing Values
What kind of missings do we have?

df[df.bill_length_mm.isna()]

df.dropna(subset=["bill_length_mm"], inplace=True)

df.shape
(342, 7)
Handling Missing Values
What kind of missings do we have?

df[df.sex.isna()]
Penguins Case Study
Standardize the penguin masses
by species and sort by standard
mass ascending.
Pandas Transform
Use transform to produce group
aggregates for each row

df.groupby("species").body_mass_g.mean()
Pandas Transform
Use transform to produce group
aggregates for each row
df["mass_species_mean"] = (df.groupby("species").body_mass_g
.transform(lambda x: x.mean()))
Pandas Transform
Use transform to produce group
aggregates for each row
df["mass_species_mean"] = (df.groupby("species").body_mass_g
.transform(lambda x: x.mean()))
df[["species", "body_mass_g", "mass_species_mean"]].sample(5)
Standard Penguin Mass
Standardize the penguin masses
by species.
df["mass_standard"] = (df.groupby("species").body_mass_g
.transform(lambda x: (x - x.mean())/x.std()))
Standard Penguin Mass
Standardize the penguin masses
by species.
df["mass_standard"] = (df.groupby("species").body_mass_g
.transform(lambda x: (x - x.mean())/x.std()))
df[["species", "body_mass_g", "mass_species_mean", "mass_standard"]].sample(5)
Penguins Case Study
Standardize the penguin masses
by species and sort by standard
mass ascending.

df.sort_values("mass_standard").head()

Data Quality
No ratings yet
Data Quality
14 pages
FDS Chapter 3
No ratings yet
FDS Chapter 3
103 pages
Data Wrangling
No ratings yet
Data Wrangling
15 pages
6.data Cleaning
No ratings yet
6.data Cleaning
20 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
DS-Unit-2 ABM Final
No ratings yet
DS-Unit-2 ABM Final
134 pages
Unit V
No ratings yet
Unit V
47 pages
Aiml Data Preprocessing
No ratings yet
Aiml Data Preprocessing
99 pages
Outliners
No ratings yet
Outliners
15 pages
DM Chapter 3
No ratings yet
DM Chapter 3
60 pages
ML Lecture 5 Data Quality
No ratings yet
ML Lecture 5 Data Quality
19 pages
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
No ratings yet
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
35 pages
Data Cleaning
No ratings yet
Data Cleaning
20 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Unit 1-Part3-Compressed
No ratings yet
Unit 1-Part3-Compressed
28 pages
Unit - 1 Data Preprocessing
No ratings yet
Unit - 1 Data Preprocessing
66 pages
What Is The Concept of Data Cleaning
No ratings yet
What Is The Concept of Data Cleaning
20 pages
Reading 5 - Data Preparation
No ratings yet
Reading 5 - Data Preparation
23 pages
Data Mining Basics
No ratings yet
Data Mining Basics
52 pages
EDA - Zep
No ratings yet
EDA - Zep
33 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
Module 1 - PPT5 - Pre - Processing of Data
No ratings yet
Module 1 - PPT5 - Pre - Processing of Data
21 pages
Data Preprocessing
No ratings yet
Data Preprocessing
12 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
SML Updated UNIT-2
No ratings yet
SML Updated UNIT-2
43 pages
Module 5 - Data Cleaning and Transformation
No ratings yet
Module 5 - Data Cleaning and Transformation
26 pages
DWM - Co2-10
No ratings yet
DWM - Co2-10
27 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
FDS UNIT 1 Part2
No ratings yet
FDS UNIT 1 Part2
47 pages
Exploratory Data Analysis-1 (EDA-1)
No ratings yet
Exploratory Data Analysis-1 (EDA-1)
38 pages
Data Mining Basics
No ratings yet
Data Mining Basics
38 pages
Python Basics Code Full Notes
No ratings yet
Python Basics Code Full Notes
53 pages
03 Data Preprocessing
No ratings yet
03 Data Preprocessing
15 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Chapter - 2 - Cleaning and Transforming Data
No ratings yet
Chapter - 2 - Cleaning and Transforming Data
27 pages
Data Cleaningin ML
No ratings yet
Data Cleaningin ML
15 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
Document
No ratings yet
Document
29 pages
Integrating Data From Different Sources
No ratings yet
Integrating Data From Different Sources
11 pages
Lab Assignment 1 Title: Data Wrangling I: Problem Statement
No ratings yet
Lab Assignment 1 Title: Data Wrangling I: Problem Statement
12 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
COS10022 - Lecture 03 - Data Preparation PDF
No ratings yet
COS10022 - Lecture 03 - Data Preparation PDF
61 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
16 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
Chapter3 DS
No ratings yet
Chapter3 DS
17 pages
Data Cleaning
No ratings yet
Data Cleaning
42 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
33 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
Programming 1A: (PROG5121)
No ratings yet
Programming 1A: (PROG5121)
12 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
33 pages
Abap News For Release 740 Inline Declarations
No ratings yet
Abap News For Release 740 Inline Declarations
35 pages
Deep Learning Ram
No ratings yet
Deep Learning Ram
21 pages
Big Data Analytics (1) : Definition
No ratings yet
Big Data Analytics (1) : Definition
15 pages
Class 11 CS & Ip Worksheets
No ratings yet
Class 11 CS & Ip Worksheets
25 pages
Pandas 1
No ratings yet
Pandas 1
13 pages
ICSE COMPUTER APPLICATIONS CLASS 10 Important Questions
100% (1)
ICSE COMPUTER APPLICATIONS CLASS 10 Important Questions
24 pages
Theory of Computation-Lecture 1
No ratings yet
Theory of Computation-Lecture 1
78 pages
DM Unit 3
No ratings yet
DM Unit 3
15 pages
Assvid
No ratings yet
Assvid
13 pages
III Unit
No ratings yet
III Unit
4 pages
Terminal Software
No ratings yet
Terminal Software
17 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Milestone 1
No ratings yet
Milestone 1
129 pages
TCS BATCH 2025 Updated PYQ
No ratings yet
TCS BATCH 2025 Updated PYQ
24 pages
DQL Reference Guide en
No ratings yet
DQL Reference Guide en
350 pages
Python Assignment Application Type
No ratings yet
Python Assignment Application Type
10 pages
PHP String Functions
No ratings yet
PHP String Functions
5 pages
Advanced C Concepts: 2501ICT Nathan
No ratings yet
Advanced C Concepts: 2501ICT Nathan
32 pages
Batch Script - Quick Guide
No ratings yet
Batch Script - Quick Guide
56 pages
Working With Word Document Content Objects - VB
No ratings yet
Working With Word Document Content Objects - VB
12 pages
Hierarchical Clustering of Words and Application To NLP Tasks
No ratings yet
Hierarchical Clustering of Words and Application To NLP Tasks
14 pages
UDP Programming
No ratings yet
UDP Programming
5 pages
Board Revision Worksheet - Unit 1 - PRT
No ratings yet
Board Revision Worksheet - Unit 1 - PRT
9 pages
Generative Grammar: Terminals Nonterminals Productions
No ratings yet
Generative Grammar: Terminals Nonterminals Productions
22 pages
PPA - TRO F05D Unit 2 (Scratch Part 2) FINAL
No ratings yet
PPA - TRO F05D Unit 2 (Scratch Part 2) FINAL
67 pages
Parv File Java 20
No ratings yet
Parv File Java 20
35 pages
CENG400 Midterm Fall 2015
No ratings yet
CENG400 Midterm Fall 2015
10 pages
Primitive Data Types and Variables Homework
No ratings yet
Primitive Data Types and Variables Homework
9 pages
Finite State Automata, Deterministic Finite Automata, Transition Function
No ratings yet
Finite State Automata, Deterministic Finite Automata, Transition Function
30 pages
Library Routine, String Manipulation and Files
No ratings yet
Library Routine, String Manipulation and Files
4 pages
Labsheet - 2 - Java - Programming
No ratings yet
Labsheet - 2 - Java - Programming
4 pages
Properties of Regular Sets
No ratings yet
Properties of Regular Sets
9 pages
Unit 5 IRS
No ratings yet
Unit 5 IRS
16 pages
JS-Advanced-Advanced-Functions-Lab
No ratings yet
JS-Advanced-Advanced-Functions-Lab
7 pages
(Key) Csa U8l7 Searching and Sorting Extra Practice
No ratings yet
(Key) Csa U8l7 Searching and Sorting Extra Practice
4 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Data Mining For Business Analytics & Data Analysis In Python
From Everand
Data Mining For Business Analytics & Data Analysis In Python
Book Option
No ratings yet