0% found this document useful (0 votes)
4 views11 pages

Session 2-4

The document outlines simple data cleansing activities, including deleting unnecessary rows, formatting values, and finding problematic data. It also introduces various Excel functions for text manipulation and data cleaning, such as CONCATENATE, EXACT, and TRIM. Additionally, it references resources for further learning on data analysis using Excel.

Uploaded by

vaibhav.poddar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

Session 2-4

The document outlines simple data cleansing activities, including deleting unnecessary rows, formatting values, and finding problematic data. It also introduces various Excel functions for text manipulation and data cleaning, such as CONCATENATE, EXACT, and TRIM. Additionally, it references resources for further learning on data analysis using Excel.

Uploaded by

vaibhav.poddar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Preprocessing and

Visualization

Session 2-3: Simple Data Cleansing Activities

Dr. Arghya Ra
IMI Kolkata
Sample Survey: https://forms.gle/CP6XgVwYWipjT4py8
Extracting Data from Different Sources

1. .csv/ .xlsx (E.g., https://www.kaggle.com/datasets/henriqueyamahata/bank-marketing)


2. .txt (E.g., https://archive.ics.uci.edu/dataset/109/wine)(E.g.,
https://archive.ics.uci.edu/dataset/19/car+evaluation)
Simple Data Cleaning Activities:

1. Delete unnecessary rows and columns

2. Erase Unnecessary cell content

3. Format Numeric/Textual Values

4. Copying, Moving data/sheets

5. Replacing Data

6. Finding problematic values


Concatenate: The CONCATENATE function combines, or joins, chunks of text into a single
text string.

The EXACT function compares two text strings. If the two text strings are exactly the same, the EXACT
function returns the logical value for true.

The FIND function finds the starting character position of one text string within another text string.
The LEFT function returns a specified number of characters from the left end of a text string.

The RIGHT function returns a specified number of characters from the right end of a text string.

The LEN function counts the number of characters in a text string.

The LOWER function returns an all‐lowercase version of a text string.


The UPPER function returns an all‐uppercase version of a text string

The PROPER function capitalizes the first letter in every word in a text string.

The MID function returns a chunk of text in the middle of text string.

The REPLACE function replaces a portion of a text string.


The SEARCH function calculates the starting position of a text fragment within a text string.

The SUBSTITUTE function replaces occurrences of text in a text string.

The TRIM function removes extra spaces from the right end of a text string.

The ROUND function rounds off the decimal part of the number to the said number of digits.
Removing Duplicate Data

Using Validation to Keep Data Clean


The contents of the slides are taken from the book:

TB1: Excel Data Analysis for Dummies (A Wiley Brand), by Stephen L. Nelson and Elizabeth C.

Nelson (3rd edition)

Chapter 3: Scrub‐a‐Dub‐Dub: Cleaning Data

Coursera Resources:
Course Name: “Excel Basics for Data Analysis” by IBM.
Week 2 “Getting Started with Using Excel Speadsheets”
Week 3 “Cleaning & Wrangling Data Using Spreadsheets”
(https://www.coursera.org/learn/excel-basics-data-analysis-ibm#syllabus)
Thank you..

You might also like