Dele
Dele
Bengaluru
20
20
Bengaluru
(Rural)
20
(Rural) Belgavi 20
Belgavi Average 17.6
•Understand header, footer, column names
Location Sales Suffix First Last
•Add column name Name Name
BOM 13
•Rename abbreviations or code BLR 20 Ms A B
•Delete- Irregular or unidentified columns/rows HYD Mr C C
20
•Split- Merged cells (URL, Address) PNQ 17 Address
Belgavi 20
FIX ROWS AND COLUMNS
IMPUTING VALUES
•Issues- Blank, NA, XX,999, etc
•Approaches- constant, average, function, external sources (other columns), fill partial
data (70-1970)
35 78000 1
450 50000 2
56 5100 0
28 53000 10
IMPUTING VALUES
IMPUTING VALUES
STANDARDIZE
•Text- extra chars, case, same formats for date and name
•Numerical- fix units, scale etc.
Source: correlation example - Google Search Source : correlation vs causation - Google Search
Data Source : data science process diagram - Google Search
DATASET